Add optimized Neon code path for the little endian case of the
xyz12Torgb48 function. The innermost loop processes the data in 4x2
pixel blocks using software gathers with the matrix multiplication
and clipping done by Neon.
Relative runtime of micro benchmarks after this patch on some
Cortex and
Add checkasm coverage for the XYZ12LE to RGB48LE path via the
ctx->xyz12Torgb48 hook. Integrate the test into the build and runner,
exercise a variety of widths/heights, compare against the C reference,
and benchmark when width is multiple of 4.
This improves test coverage for the new function poi
Prepare for xyz12Torgb48 architecture-specific optimizations in
subsequent patches by:
- Grouping XYZ+RGB gamma LUTs and 3x3 matrices into ColorXform
(ctx->xyz2rgb/ ctx->rgb2xyz), replacing scattered fields.
- Dropping the unused last matrix column giving the same or smaller
SwsInternal siz
Hi,
This series prepares and optimizes the xyz12Torgb48 path in swscale.
Patch 1 refactors the XYZ/RGB state into a ColorXform struct and adds a
per-context xyz12Torgb48 hook with no functional changes.
Patch 2 adds checkasm coverage for the xyz12Torgb48le path.
Patch 3 introduces an AArch64 Ne