On Thu, 23 Apr 2015 13:10:10 +0100, Pekka Paalanen <ppaala...@gmail.com> wrote:

Affine-bench differs from lowlevel-blt-bench in the following:
- does not test different sized operations fitting to specific caches,
  destination is always 1920x1080
- allows defining the affine transformation parameters
- carefully computes operation extents to hit the COVER_CLIP fast paths
[...]
did I capture all the special features affine-bench has over
lowlevel-blt-bench? I see llbb could use a transform too, and
was looking at why extending that would be unwanted.

Yes, there's some support in lowlevel-blt-bench for scaled plots, but
it's limited to a single scale factor - it's the smallest expressable
increment larger than unity, corresponding to an oh-so-slight size
reduction, and is applied only in the X axis. The fact that lowlevel-blt
bench doesn't attempt anything more than that means it can make some
simplifications:

* the source and destination buffers can be the same size as for the
  unscaled case
* automatically satisfies COVER_CLIP_NEAREST (although when I was
  analysing the flags yesterday, I was reminded that before I removed the
  8*pixman_fixed_e fudge factor, lowlevel-blt-bench's bilinear operations
  were incorrectly calculated to *not* satisfy COVER_CLIP_BILINEAR)
* no need to add translation offsets into the transform matrix
* one pixel row in = one pixel row out greatly simplifies the
  calculations about what can fit in L1 and L2 caches - this is why I
  deliberately only tested the memory-constrained case in affine-bench.
  For example, in a 90-degree rotation, each new output pixel in a row
  will require reading from a different source cacheline.

When I was writing the ARMv6 scaled fetchers, I became aware that they
were going to take quite different code paths in the enlargement vs
reduction cases, as well as when a vertical scaling factor was involved.
I needed to be able to benchmark all these combinations in order to
select the best prefetch distances, if nothing else. I also realised
there was currently no way to benchmark other common affine transforms
that I might want to address in future such as reflections (including
those used by the reflect repeat type) or even simple rotations, so with
the difficulties of ensuring COVER_CLIP_BILINEAR too I just decided it
would be easier to write a new benchmarker.

I've reviewed your version, looks fine to me. A very minor point: I'm not
sure it's worth making a copy of the transform struct at the start of
bench() because we mostly only use a pointer to the struct thereafter, so
you might as well have kept using bi->transform.

Ben
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to