Maybe we should include pixman list in this. In case you're not subscribed
I'm forwarding it to that list now.
On Tue, 7 Feb 2023, Akihiko Odaki wrote:
On 2023/02/06 4:16, Richard Henderson wrote:
On 2/5/23 08:44, BALATON Zoltan wrote:
On Sun, 5 Feb 2023, Richard Henderson wrote:
On 2/4/23 06:57, BALATON Zoltan wrote:
This has just bounced, I hoped to still be able to post after moderation
but now I'm resending it after subscribing to the pixman list. Meanwhile
I've found this ticket as well:
https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/71
See the rest of the message below. Looks like this is being worked on
but I'm not sure how far is it from getting resolved. Any info on that?
Please try this:
https://gitlab.freedesktop.org/rth7680/pixman/-/tree/general
It provides a pure C version for ultimate fallback.
Unfortunately, there are no test cases for this, nor documentation.
It can share the implementation with fast_composite_src_memcpy().
fast_composite_src_memcpy() should be well-tested with the tests for
pixman_image_composite(). arm-neon does similar so we can trust
fast_composite_src_memcpy() functions as blt.
Thanks, I don't have hardware to test this but maybe Akihiko or somebody
else here cam try. Do you think pixman_fill won't have the same problem?
It seems to have at least a fast_path implementation but I'm not sure how
pixman selects these.
For fill, I think the fast_path implementation should work, so long as it
isn't disabled via environment variable. I'm not sure why that is, and why
_fast_path isn't part of _general.
The implementation of fill should be moved to pixman-general.c but the other
part of pixman-fast-path.c shouldn't be.
By isolating the non-essential fast-path code to pixman-fast-path.c, you can
disable it with the environment variable when you are not confident with the
implementation, and that may help debugging. However, if pixman-fast-path.c
has some essential code like the implementation of fill, the utility of the
environment variable will be impaired as setting the environment variable may
break things.
Indeed, the fast_path implementation of fill should be easily vectorized by
the compiler. I would expect it to be competitive with an assembly
implementation. I would expect the implementation chain design to only be
useful when multiple vector implementations are supported and selected at
runtime -- e.g. the x86 SSE2 vs SSSE3 stuff.
r~