Re: [FFmpeg-devel] [EXTERNAL] Re: [PATCH] Boost FPS and performance: Optimize vertical loop for cache-friendly access [libavcodec/jpeg2000dwt.c:dwt_decode97_float]

2025-05-16 Thread Chitra Dey Sarkar via ffmpeg-devel
[libavcodec/jpeg2000dwt.c:dwt_decode97_float] On Thu, May 15, 2025 at 10:19:57PM +0200, Michael Niedermayer wrote: > Hi > > On Wed, May 14, 2025 at 06:40:03PM +0200, Michael Niedermayer wrote: > > Hi Chitra > > > > On Wed, May 14, 2025 at 03:55:59AM +, Chitra Dey Sarkar

Re: [FFmpeg-devel] [EXTERNAL] Re: [PATCH] Boost FPS and performance: Optimize vertical loop for cache-friendly access [libavcodec/jpeg2000dwt.c:dwt_decode97_float]

2025-05-14 Thread Chitra Dey Sarkar via ffmpeg-devel
[libavcodec/jpeg2000dwt.c:dwt_decode97_float] Hi Chitra On Wed, May 14, 2025 at 03:55:59AM +, Chitra Dey Sarkar via ffmpeg-devel wrote: > Original Implementation: > - > In the original implementation, the "VER_SD" section processes image data >

[FFmpeg-devel] Test email from Chitra - chitra....@microsoft.com

2025-05-13 Thread Chitra Dey Sarkar via ffmpeg-devel
This is a test email ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] Boost FPS and performance: Optimize vertical loop for cache-friendly access [libavcodec/jpeg2000dwt.c:dwt_decode97_float]

2025-05-13 Thread Chitra Dey Sarkar via ffmpeg-devel
Original Implementation: - In the original implementation, the "VER_SD" section processes image data stored in *data using strided memory access in a vertical fashion This leads to inefficient memory access patterns and cache thrashing due to non-sequential data a

[FFmpeg-devel] [PATCH] Boost FPS and performance: Optimize vertical loop for cache-friendly access [libavcodec/jpeg2000dwt.c:dwt_decode97_float]

2025-05-14 Thread Chitra Dey Sarkar via ffmpeg-devel
From d074ea81c12132e3a92211679adbe2d2cb4d5a69 Mon Sep 17 00:00:00 2001 From: ChitraDeySarkar Date: Wed, 14 May 2025 11:51:35 -0700 Subject: [PATCH] Boost FPS and performance: Optimize vertical loop for cache-friendly access [libavcodec/jpeg2000dwt.c:dwt_decode97_float] From: chdey...@microsoft.co

[FFmpeg-devel] libswscale.c : ff_xyz12Torgb48 expensive unaligned 16 byte accesses

2025-05-25 Thread Chitra Dey Sarkar via ffmpeg-devel
H We have been profiling FFmpeg at Microsoft and have identified that ff_xyz12ToRgb48 has a high sample count ( profiled every 1ms ) It seems like ff_xyz12ToRgb48 has performance penalty for 1. Unaligned read and write access 2. Access to xyz2rgb_matrix 3. Multiplication I would be int