[FFmpeg-devel] [PATCH 3/3] swscale: Add AArch64 Neon path for xyz12Torgb48 LE

2025-11-26 Thread Arpad Panyik via ffmpeg-devel
Add optimized Neon code path for the little endian case of the xyz12Torgb48 function. The innermost loop processes the data in 4x2 pixel blocks using software gathers with the matrix multiplication and clipping done by Neon. Relative runtime of micro benchmarks after this patch on some Cortex and

[FFmpeg-devel] [PATCH 2/3] checkasm: Add xyz12Torgb48le test

2025-11-26 Thread Arpad Panyik via ffmpeg-devel
Add checkasm coverage for the XYZ12LE to RGB48LE path via the ctx->xyz12Torgb48 hook. Integrate the test into the build and runner, exercise a variety of widths/heights, compare against the C reference, and benchmark when width is multiple of 4. This improves test coverage for the new function poi

[FFmpeg-devel] [PATCH 1/3] swscale: Refactor XYZ+RGB state and add xyz12Torgb48 hook

2025-11-26 Thread Arpad Panyik via ffmpeg-devel
Prepare for xyz12Torgb48 architecture-specific optimizations in subsequent patches by: - Grouping XYZ+RGB gamma LUTs and 3x3 matrices into ColorXform (ctx->xyz2rgb/ ctx->rgb2xyz), replacing scattered fields. - Dropping the unused last matrix column giving the same or smaller SwsInternal siz

[FFmpeg-devel] [PATCH 0/3] swscale: refactor and optimize xyz12Torgb48

2025-11-26 Thread Arpad Panyik via ffmpeg-devel
Hi, This series prepares and optimizes the xyz12Torgb48 path in swscale. Patch 1 refactors the XYZ/RGB state into a ColorXform struct and adds a per-context xyz12Torgb48 hook with no functional changes. Patch 2 adds checkasm coverage for the xyz12Torgb48le path. Patch 3 introduces an AArch64 Ne