Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-23 Thread Ben Avison

On Thu, 23 Apr 2015 17:12:59 +0100, I wrote:

I imagine most of the time, you'll have a source image of fixed size, and
you'll either have a target destination size (in which case your task is
to calculate the transform matrix coefficients) or you'll have a target
scaling factor (in which case your task is to work out the destination
size that can be achieved without running off the sides of the source
image).


Just to add to this, an example has come to mind that demonstrates why
higher level software should really know about the pixman_fixed_e offset
for nearest-neighbour scaling in order to make best use of Pixman.

Assume we have a source image that's 101 pixels wide, and we know it has
to be plotted at a scale factor of 50%. How many destination pixels do we
tell Pixman to plot?

In reality, given a transform matrix whose diagonal is (0.5, 0.5, 1),
Pixman is capable out picking out source pixels with index 0, 2, 4, 6 ...
100 - that's 51 pixels - whilst still using a COVER_CLIP fast path. If it
weren't for the pixman_fixed_e offset, it would pick out source pixels 1,
3, 5, 7 ... 99 instead - that's 50 pixels.

Now, if the caller asks for 50 pixels and Pixman uses the offset, then we
get to use the COVER_CLIP fast path, but we've trimmed two pixels off the
right and none off the left.

But if the caller asks for 51 pixels and Pixman doesn't use the offset,
then Pixman realises it needs a value for out-of-bounds pixel index 101
so won't use the COVER_CLIP fast path.

So, for the best combination of image quality and speed, the details of
Pixman's rounding algorithm *can* be significant.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-23 Thread Ben Avison

On Thu, 23 Apr 2015 12:46:56 +0100,
Pekka Paalanen  wrote:


On Wed, 15 Apr 2015 17:03:43 +0100
"Ben Avison"  wrote:


For nearest-neighbour scaling, the nth pixel of a source or mask image is 
considered to cover the space (n, n+1]. Yes, that's inclusive at the
upper end and exclusive at the lower end, not what you might expect (and
this is the reason for adjustments by pixman_fixed_e in various places).


Aha. Slightly surprising. Is this only internal to Pixman, or does it
show outside too? That is, something that users should take into
account, needed for the "exact alignment" to hit the COVER_CLIP?


I've never been really sure if it was intended to work that way, but it's
the way the code functions as written. If you change it, you do hit some
subtle differences in the test suite, but I wonder if it's doing what the
author intended. I say that because with most "obvious" scaling factors,
assuming you don't use a translation in the transform matrix (i.e. you
scale from pixel edges) then you don't actually hit the borderline
position very often, if ever, so it's strange that they went to any
effort to do anything other than what works out to be the simplest thing
to do.

I think some ASCII art might help. Consider a 4x enlargement.

Key:
| integer X in source space
- integer Y in source space
S centre of source pixel (half-integer X and Y)
D point at which source is sampled (centre of destination pixel)

  +---+---+
  | D   D   D   D | D   D   D   D |
  |   |   |
  | D   D   D   D | D   D   D   D |
  |   S   |   S   |
  | D   D   D   D | D   D   D   D |
  |   |   |
  | D   D   D   D | D   D   D   D |
  +---+---+
  | D   D   D   D | D   D   D   D |
  |   |   |
  | D   D   D   D | D   D   D   D |
  |   S   |   S   |
  | D   D   D   D | D   D   D   D |
  |   |   |
  | D   D   D   D | D   D   D   D |
  +---+---+

You can see that we never actually sample any of the boundary lines, so
the adjustment by pixman_fixed_e has no effect in this case. The same is
true of any integer enlargement factor, at least until you start to get
close to 65536x enlargement.

If you're cunning, you might ask about half-integer enlargement factors.
Here's what a 2.5x enlargement looks like:

  +---+---+
  |   |   |
  |   D   D   D   D   D   |
  |   |   |
  |   |   |
  | S | S |
  |   D   D   D   D   D   |
  |   |   |
  |   |   |
  |   |   |
  +---D---D---D---D---D---+
  |   |   |
  |   |   |
  |   |   |
  |   D   D   D   D   D   |
  | S | S |
  |   |   |
  |   |   |
  |   D   D   D   D   D   |
  |   |   |
  +---+---+

But this neglects the fact that this requires a value of 2/5 in the
transform matrix. This, like the reciprocals of all other half-integer
values (2/3, 2/7, 2/9, 2/11 etc) are subject to rounding error when
represented as binary, so you won't actually end up hitting the
borderline regularly.

The only case I can think of where you would routinely sample at the
edges is for a power-of-two reduction. For example, this is what a 0.25
scale factor looks like:

  +---+---+---+---+---+---+---+---+
  | S | S | S | S | S | S | S | S |
  +---+---+---+---+---+---+---+---+
  | S | S | S | S | S | S | S | S |
  +---+---D---+---+---+---D---+---+
  | S | S | S | S | S | S | S | S |
  +---+---+---+---+---+---+---+---+
  | S | S | S | S | S | S | S | S |
  +---+---+---+---+---+---+---+---+
  | S | S | S | S | S | S | S | S |
  +---+---+---+---+---+---+---+---+
  | S | S | S | S | S | S | S | S |
  +---+---D---+---+---+---D---+---+
  | S | S | S | S | S | S | S | S |
  +---+---+---+---+---+---+---+---+
  | S | S | S | S | S | S | S | S |
  +---+---+---+---+---+---+---+---+

The (n, n+1] rule means that you would sample the 2nd and 6th source
pixels, rather than the 3rd and 7th if you used a [n, n+1) rule.

If I had to hazard a guess, I'd say that the authors were looking solely
at the 0.5 scale factor case and were surprised to find they got the 2nd,
4th, 6th source pixels out rather than the 1st, 3rd, 5th and added the
pixman_fixed_e to "correct" this rather than looking at the wider
picture. But that's just a guess.

The balance seems to shift the other way if you scale from pixel centres
rather than edges, and it becomes easier to hit the edges with

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-23 Thread Pekka Paalanen
On Wed, 15 Apr 2015 17:03:43 +0100
"Ben Avison"  wrote:

> On Wed, 15 Apr 2015 10:39:54 +0100,
> Pekka Paalanen  wrote:
> 
> > On Tue, 14 Apr 2015 19:00:58 +0100
> > "Ben Avison"  wrote:
> >
> >> Assume the reasonable case that you want to plot the whole of
> >> an image of size x,y at a size m,n. You need to set the diagonals of
> >> the transform to
> >>
> >> floor(pixman_fixed_1*(x-1)/(m-1))
> >> floor(pixman_fixed_1*(y-1)/(n-1))
> >> 1
> >>
> >> and then solve for
> >>
> >> / 0.5 \   / 0.5 \
> >> T . | 0.5 | = | 0.5 |
> >> \ 1   /   \ 1   /
> >>
> >> to find the translation coefficients that will ensure the top-left
> >> source pixel aligns exactly to the top-left destination pixel.
> >
> > Urgh, I don't think we account for *that* in e.g. Weston's Pixman
> > renderer. I could bet Weston follows the GL convention of defining the
> > extents of an image region to be the boundaries of the image, not the
> > pixel centers which would leave a half pixel margin outside of the
> > region. I mean, Weston works as if you are transforming geometric
> > primitives like triangles, and rasterization happens later, while it
> > still uses pixel centers at .5 coordinates, meaning that a single pixel
> > covers the X-coordinates [0.0, 1.0).
> >
> > Are you saying that this idea doesn't work with Pixman?
> 
> I had a recollection of Pixman's geometric model, but given that there's
> no documentation I felt the need to go back and check the source code, as
> it can be very easy to get this sort of thing subtly wrong:
> 
> Pixman's scaling is such that pixel centres are similarly positioned
> irrespective of whether you're doing nearest-neighbour or bilinear
> scaling.
> 
> For nearest-neighbour scaling, the nth pixel of a source or mask image is
> considered to cover the space (n, n+1]. Yes, that's inclusive at the
> upper end and exclusive at the lower end, not what you might expect (and
> this is the reason for adjustments by pixman_fixed_e in various places).

Aha. Slightly surprising. Is this only internal to Pixman, or does it
show outside too? That is, something that users should take into
account, needed for the "exact alignment" to hit the COVER_CLIP?

I'm guessing yes, because you do that in affine-bench, reversed... am I
on the right track here?

Sorry if I'm a bit slow here.

> For bilinear scaling, the contribution of the nth pixel scales up
> linearly from 0 at n-0.5 to 1 at n+0.5 then back down to 0 at n+1.5, so
> its overall range of influence is (n-0.5, n+1.5).

Ok, that is as I'd expect, I think.

> Since the centre coordinate of each destination pixel is what is fed
> through the transformation matrix, you can see that under the identity
> transform, there is always a 1:1 mapping from a single source pixel to a
> single destination pixel, irrespective of whether you're nominally doing
> nearest or bilinear scaling. This enables Pixman to select unscaled fast
> paths just by examining the transform.

Right.

> If you scale from the outer extents (in other words, use scale factors of
> x/m, y/n rather than (x-1)/(m-1), (y-1)/(n-1), and align the top-left of
> the first source pixel with the top-left of the first destination pixel)
> then that's fine, as long as:
> 
> a) you're using nearest-neighbour scaling
> or
> b) you're using bilinear scaling and a scale factor > 1 (i.e. reduction
> rather than enlargement)
> 
> otherwise Pixman will decide that it needs to use a small contribution
>  from a single-pixel border just outside the bounds of the source image,
> for which it then needs to start following the repeat-type rules (pad/
> tile/reflect/none).

Ok, yeah. Since Weston is working with geometry rather than pixels, I
think that works just fine... except for the left or top edge, which
like you said for the pixel occupancy rules, would be accounted to an
outside pixel. Sounds like should be removing pixman_fixed_e in Weston
from the source area left and top or something to hit COVER_CLIP.

Well, that's really off-topic; just for my personal education.


Thanks,
pq
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-15 Thread Ben Avison

On Wed, 15 Apr 2015 10:39:54 +0100,
Pekka Paalanen  wrote:


On Tue, 14 Apr 2015 19:00:58 +0100
"Ben Avison"  wrote:


Assume the reasonable case that you want to plot the whole of
an image of size x,y at a size m,n. You need to set the diagonals of
the transform to

floor(pixman_fixed_1*(x-1)/(m-1))
floor(pixman_fixed_1*(y-1)/(n-1))
1

and then solve for

/ 0.5 \   / 0.5 \
T . | 0.5 | = | 0.5 |
\ 1   /   \ 1   /

to find the translation coefficients that will ensure the top-left
source pixel aligns exactly to the top-left destination pixel.


Urgh, I don't think we account for *that* in e.g. Weston's Pixman
renderer. I could bet Weston follows the GL convention of defining the
extents of an image region to be the boundaries of the image, not the
pixel centers which would leave a half pixel margin outside of the
region. I mean, Weston works as if you are transforming geometric
primitives like triangles, and rasterization happens later, while it
still uses pixel centers at .5 coordinates, meaning that a single pixel
covers the X-coordinates [0.0, 1.0).

Are you saying that this idea doesn't work with Pixman?


I had a recollection of Pixman's geometric model, but given that there's
no documentation I felt the need to go back and check the source code, as
it can be very easy to get this sort of thing subtly wrong:

Pixman's scaling is such that pixel centres are similarly positioned
irrespective of whether you're doing nearest-neighbour or bilinear
scaling.

For nearest-neighbour scaling, the nth pixel of a source or mask image is
considered to cover the space (n, n+1]. Yes, that's inclusive at the
upper end and exclusive at the lower end, not what you might expect (and
this is the reason for adjustments by pixman_fixed_e in various places).

For bilinear scaling, the contribution of the nth pixel scales up
linearly from 0 at n-0.5 to 1 at n+0.5 then back down to 0 at n+1.5, so
its overall range of influence is (n-0.5, n+1.5).

Since the centre coordinate of each destination pixel is what is fed
through the transformation matrix, you can see that under the identity
transform, there is always a 1:1 mapping from a single source pixel to a
single destination pixel, irrespective of whether you're nominally doing
nearest or bilinear scaling. This enables Pixman to select unscaled fast
paths just by examining the transform.

If you scale from the outer extents (in other words, use scale factors of
x/m, y/n rather than (x-1)/(m-1), (y-1)/(n-1), and align the top-left of
the first source pixel with the top-left of the first destination pixel)
then that's fine, as long as:

a) you're using nearest-neighbour scaling
or
b) you're using bilinear scaling and a scale factor > 1 (i.e. reduction
   rather than enlargement)

otherwise Pixman will decide that it needs to use a small contribution
from a single-pixel border just outside the bounds of the source image,
for which it then needs to start following the repeat-type rules (pad/
tile/reflect/none).


Hmm, sure, Pixman needs to transform pixel centers internally, since it
rasterizes on the destination image... so what
compute_transformed_extents() does is that it assumes the input is
geometry, converts corner pixels into pixel centers for rasterization,
and returns the bounding box of the transformed pixel centers.
Then that is used to calculate what pixels would be accessed in the
source image.

Did I get that right?


Yes. Its only purpose is to determine the (inclusive) first and last
source pixels required, so that the caller can then check to see if they
are within bounds.


Btw. are you sure there cannot be rounding in the hand-written fast
paths? If some assembly path computes an approximation of source
coordinates, could it not hit the "wrong" pixel? Zero coefficient or
not.


I'm pretty sure, yes - at least for affine transforms. The transform
coefficients, inputs and outputs are all x.16 fixed point, with a rule
for handling rounding for the multiplications (round to nearest with
halfs rounding to +infinity) that's enforced in platform-neutral code
like pixman_transform_point_31_16(). As a fast path steps along a
destination row, there are no further multiplications, it just adds the
transform coefficients from the first column to the running total, and no
rounding is therefore needed.

If any fast path deviated from this, it would fail the test suite, which
requires binary-identical output. Actually, I wonder if this rounding
issue is the reason why projective transforms aren't more widely featured
in the tests?

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-15 Thread Pekka Paalanen
On Tue, 14 Apr 2015 19:00:58 +0100
"Ben Avison"  wrote:

> On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen  
> wrote:
> > So the matrices are indexed as matrix[row][column] in Pixman?
> 
> That's my understanding, yes.
> 
> > A short'ish comment somewhere to say why you are doing this offsetting
> > would be nice, and that the offsetting is the reason to allocate a
> > margin.
> 
> Since you've been reworking the patches anyway, do you have enough
> information to add the comments yourself or do you want me to do them?

This and the empty lines I can also add them myself I think, if there
is a need for me to revise the patch.

> >> One simple way round this is to apply a 0.5 pixel translation in the
> >> transformation matrix, so that the pattern becomes:
> >>
> >> d[0] =1.00 * s[0]
> >> d[1] =0.50 * s[0] + 0.50 * s[1]
> >> d[2] =  1.00 * s[1]
> >> d[3] =  0.50 * s[1] + 0.50 * s[2]
> >>
> >> but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't
> >> see why the fudge factor is needed at all, at least not in the affine
> >> case; it's described as being to compensate for rounding errors, but
> >> there is no rounding involved in fixed-point affine transforms.
> >
> > Maybe the rounding refers to the pixman_fixed_to_int() calls? I could
> > imagine it is to guarantee that an xmin=0.5 gets really rounded to 0,
> > and xmax=0.5 gets rounded to 1.
> 
> There's a slightly better worked example in the later patch I was
> thinking of:
> 
> http://lists.freedesktop.org/archives/pixman/2014-September/003380.html
> 
> As I say, we're getting a bit ahead of ourselves here, as there were
> about 30 patches between the ones we're reworking at the moment and that
> one. Søren did give some comments on it at the end of his review here:
> 
> http://lists.freedesktop.org/archives/pixman/2014-October/003457.html
> 
> which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond
> the bitmap bounds if a fast path happened to over-read pixels and then
> multiply them by a 0 filter coefficient. I think we can probably cater
> for that better by checking whether a bitmap starts or ends at a page
> boundary, and whether we're reading the first and last pixels of the
> image if it does.

I would hesitate to add logic along the lines of "reading beyond array
bounds is ok if it doesn't cross page boundaries". It might be true,
but it could produce hard-to-understand false-positives in Valgrind,
for example, which tracks every byte individually. It's also fairly
surprising and unexpected condition for choosing different code paths,
IMHO, since there is no hardware reason to do it.

But, I would be happy to enhance the test suite to test for bad
accesses better. The test suite is something I can well work on.

> > To be honest, both cases you describe sound strange to me. Surely if I
> > use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect
> > d[0] = 0.5 * s[0] + 0.5 * s[1]
> > when assuming the box filter (if I got my terminology right)...
> 
> You're forgetting it's pixel-centres that are fed through the transform.
> To be honest, I think it's already quite a headache for an application to
> work out how to set up the transform in order to hit a "cover" scaled
> fast path. Assume the reasonable case that you want to plot the whole of
> an image of size x,y at a size m,n. You need to set the diagonals of the
> transform to
> 
> floor(pixman_fixed_1*(x-1)/(m-1))
> floor(pixman_fixed_1*(y-1)/(n-1))
> 1
> 
> and then solve for
> 
>  / 0.5 \   / 0.5 \
> T . | 0.5 | = | 0.5 |
>  \ 1   /   \ 1   /
> 
> to find the translation coefficients that will ensure the top-left source
> pixel aligns exactly to the top-left destination pixel.

Urgh, I don't think we account for *that* in e.g. Weston's Pixman
renderer. I could bet Weston follows the GL convention of defining the
extents of an image region to be the boundaries of the image, not the
pixel centers which would leave a half pixel margin outside of the
region. I mean, Weston works as if you are transforming geometric
primitives like triangles, and rasterization happens later, while it
still uses pixel centers at .5 coordinates, meaning that a single pixel
covers the X-coordinates [0.0, 1.0).

Are you saying that this idea doesn't work with Pixman?

Hmm, sure, Pixman needs to transform pixel centers internally, since it
rasterizes on the destination image... so what
compute_transformed_extents() does is that it assumes the input is
geometry, converts corner pixels into pixel centers for rasterization,
and returns the bounding box of the transformed pixel centers.
Then that is used to calculate what pixels would be accessed in the
source image.

Did I get that right?

> To then require that the caller also knows that you need to nudge things
> along by 8*pixman_fixed_e as well feels like it's requiring too much
> knowledge of Pixman's internals to me. I did

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Ben Avison

On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen  wrote:

So the matrices are indexed as matrix[row][column] in Pixman?


That's my understanding, yes.


A short'ish comment somewhere to say why you are doing this offsetting
would be nice, and that the offsetting is the reason to allocate a
margin.


Since you've been reworking the patches anyway, do you have enough
information to add the comments yourself or do you want me to do them?


One simple way round this is to apply a 0.5 pixel translation in the
transformation matrix, so that the pattern becomes:

d[0] =1.00 * s[0]
d[1] =0.50 * s[0] + 0.50 * s[1]
d[2] =  1.00 * s[1]
d[3] =  0.50 * s[1] + 0.50 * s[2]

but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't
see why the fudge factor is needed at all, at least not in the affine
case; it's described as being to compensate for rounding errors, but
there is no rounding involved in fixed-point affine transforms.


Maybe the rounding refers to the pixman_fixed_to_int() calls? I could
imagine it is to guarantee that an xmin=0.5 gets really rounded to 0,
and xmax=0.5 gets rounded to 1.


There's a slightly better worked example in the later patch I was
thinking of:

http://lists.freedesktop.org/archives/pixman/2014-September/003380.html

As I say, we're getting a bit ahead of ourselves here, as there were
about 30 patches between the ones we're reworking at the moment and that
one. Søren did give some comments on it at the end of his review here:

http://lists.freedesktop.org/archives/pixman/2014-October/003457.html

which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond
the bitmap bounds if a fast path happened to over-read pixels and then
multiply them by a 0 filter coefficient. I think we can probably cater
for that better by checking whether a bitmap starts or ends at a page
boundary, and whether we're reading the first and last pixels of the
image if it does.


To be honest, both cases you describe sound strange to me. Surely if I
use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect
d[0] = 0.5 * s[0] + 0.5 * s[1]
when assuming the box filter (if I got my terminology right)...


You're forgetting it's pixel-centres that are fed through the transform.
To be honest, I think it's already quite a headache for an application to
work out how to set up the transform in order to hit a "cover" scaled
fast path. Assume the reasonable case that you want to plot the whole of
an image of size x,y at a size m,n. You need to set the diagonals of the
transform to

floor(pixman_fixed_1*(x-1)/(m-1))
floor(pixman_fixed_1*(y-1)/(n-1))
1

and then solve for

/ 0.5 \   / 0.5 \
T . | 0.5 | = | 0.5 |
\ 1   /   \ 1   /

to find the translation coefficients that will ensure the top-left source
pixel aligns exactly to the top-left destination pixel.

To then require that the caller also knows that you need to nudge things
along by 8*pixman_fixed_e as well feels like it's requiring too much
knowledge of Pixman's internals to me. I didn't really like putting it in
affine-bench to begin with for this reason, but it doesn't work without
it. So I made do with removing it as quickly as I could - the original
intention was for it to be in the same patch series, but obviously the
patch series has grown too large and unwieldy for that the be the case
any more!

When I've seen timing breakdowns of which Pixman operations are being hit
by applications, I've seen far more scaled plots with repeat type PAD
than I'd really expect. My suspicions are that the bulk of these are down
to how difficult it is for applications to set up the transform for
maximum efficiency.


I think my hope was that since we're processing images of size 1920 *
1080 pixels, you wouldn't have any particular pixels persisting in the
cache from the previous iteration (unless perhaps if you're testing a
high-factor enlargement on a platform that has caches that don't
allocate on write).


Right, so is there any need to flush_cache() at all?


Possibly not. I was taking my cue from lowlevel-blt-bench again, where it
flushes the cache between each type of test. And looking at the
implementation of flush_cache() again now, I note that due to the action
of allocate-on-write and physically mapped caches, it won't completely
flush most caches, only one page-size worth! Perhaps just delete it then.


>> +bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 
500, &n, &t1, pixman_image_composite_wrapper);
>> +bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, n, 
UINT32_MAX, NULL, &t2, pixman_image_composite_empty);

[...]

I think that part I understood. My comment was about having a {} block
without any flow control statements for it. You are just using it to
scope some variables, which suggests that the block should actually be
a separate function.


Oh right. Well, I think it was m

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Pekka Paalanen
On Wed, 08 Apr 2015 02:27:48 +0100
"Ben Avison"  wrote:

> On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen  
> wrote:

> > If there is no transform, why not return the original extents?
> > These have been reduced by a half unit in all four directions.
> 
> It's more obviously relevant in the bilinear scaling case, but a pixel's
> colour is considered to apply to the midpoint of each pixel. Thus an
> image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ...
> (N-1.5), (N-0.5). Therefore, when you're working out which source pixels
> are required to generate a range of output pixels, it's 0.5 and (N-0.5)
> that you need to feed through any transformation matrix. If the no
> transform case were special-cased, you'd then need to special-case it
> again in the calling function because the 0.5 offset applies (albeit at
> different scaling factors) to both source and destination images, so the
> caller will be adding the 0.5 pixel border back on again.

Ok, so the extents going in to compute_transformed_extents() have a
different meaning than the ones coming out. The ones going in define
the edges of the image, while the ones coming out are reduced to the
corner pixel centers.

Which means that it is incorrect to, say, compute extents for two
sequential transforms one by one but you must combine the transforms
first and then call compute_transformed_extents() just once.

I was just surprised by this convention.

> In affine-bench, I calculate image sizes based on bilinear scaling for
> simplicity, since a source or mask image big enough to satisfy COVER_CLIP
> for bilinear scaling will also do so for nearest scaling.
> 
> [compute_transformed_extents]
> >> +tx1 = ty1 = INT64_MAX;
> >> +tx2 = ty2 = INT64_MIN;
> >> +
> >> +for (i = 0; i < 4; ++i)
> >> +{
> >> +pixman_fixed_48_16_t tx, ty;
> >> +pixman_vector_t v;
> >> +
> >> +v.vector[0] = (i & 0x01)? x1 : x2;
> >> +v.vector[1] = (i & 0x02)? y1 : y2;
> >> +v.vector[2] = pixman_fixed_1;
> >> +
> >> +if (!pixman_transform_point (transform, &v))
> >> +return FALSE;
> >> +
> >> +tx = (pixman_fixed_48_16_t)v.vector[0];
> >> +ty = (pixman_fixed_48_16_t)v.vector[1];
> >
> > This works only for affine. This is called affine-bench, so that is
> > natural, yet might be nice to add an assert or something to guarantee
> > no-one will copy this code to something else that uses projective
> > transforms.
> 
> I might be missing something, but it looks to me like
> pixman_transform_point (through its subroutine
> pixman_transform_point_31_16) does handle projective transforms, and what
> this code segment is doing is iterating over the four corners of the
> destination rectangle and finding the maximum and minimum source x/y by
> considering the transformed position of each corner independently. Surely
> the whole set of destination pixels, once passed through even a
> projective matrix, must still lie within this bounding box?

Ahh, another case where I had wrong assumptions.
pixman_transform_point() indeed does guarantee, that v.vector[2] is
1.0. I'm used to assume it's not guaranteed because the guarantee
leads to a division by zero for points at infinity. But Pixman cannot
deal with points at infinity anyway, so it sort of makes sense in
relation to pixel images.

> [flush_cache]
> >> +volatile const char *x = clean_space;
> [...]
> >> +(void) *x;
> >
> > Are you sure this actually does the read? That the compiler is not
> > allowed to discard the read? That's something I'm never sure of.
> 
> That's the point of the volatile qualifier - the compiler isn't allowed
> to throw away accesses to an object which is declared volatile, it has to
> perform exactly as many accesses as you ask for and in the same order.
>  From the C standard:
> 
> "A volatile declaration may be used to describe an object
> corresponding to a memory-mapped input/output port or an object
> accessed by an asynchronously interrupting function. Actions on
> objects so declared shall not be 'optimized out' by an
> implementation or reordered except as permitted by the rules for
> evaluating expressions."
> 
> What you do need to be careful of these days is that this says nothing
> about the number of or ordering of accesses as viewed by any part of the
> system beyond the L1 cache, including any other cores - that's where
> memory barriers come in. But that's not relevant for the purposes of
> cleaning the cache.

Very good.

> >> +x += 32;
> >
> > Why 32? Is this CACHELINE_LENGTH from lowlevel-blt-bench.c?
> 
> Yes, that's the origin. You might argue for reading the cacheline length
> at runtime, as with cache and page size, but it's less critical here. If
> your cacheline length was actually 64 bytes, then reading every 32nd byte
> would still pull in every cacheline. Reading every single byte would have
> the same effect but be a bit of a waste of time; t

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-07 Thread Ben Avison

On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen  wrote:

affine-bench should be added to .gitignore too.


OK.


+#define L2CACHE_SIZE (128 * 1024)


I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it
come from? If it's really an arch/platform-independent constant, maybe
some test header could have it?


As you will no doubt have noticed, affine-bench is heavily inspired by
lowlevel-blt-bench. Admittedly it's a bit of a cheat to assume one L2
cache size, but in lowlevel-blt-bench, it's only used for sizing the
images used for the L2 test. For those purposes, it's only important that
it's a number big enough that the images are flushed out of the L1 cache
but small enough that they aren't flushed out to memory, so as long as
it's bigger than the largest L1 cache you're likely to encounter and no
smaller than the smallest L2 cache, then it'll do, and saves lots of
messy OS-specific code to determine the cache sizes.

Admittedly, now I look back on the use of the same constant in
affine-bench, I've taken the point of view that it's too difficult in the
presence of significant transformation coefficients to calculate image
sizes that sit within a particular cache, so I'm only bothering to test
the memory-constrained case. To do that, I'm using L2CACHE_SIZE to size
an array used to flush the L2 cache - but that needs the *largest*
typical L2 cache size, not the smallest, so 128K won't cut it here.

While I have easy access to the cache sizes on CPUs that I'm actively
targeting, a general-purpose benchmarker should probably be suitable for
all architectures. Should we read it from the OS somehow? (Anyone know
how best to do that for, say, Linux, Windows, OS X and iOS, in that case?)


Looks like Pixman style is using separate statements for struct
definition and typedefs.


Not exclusively - I think the bits I've looked at usually do, but now
that I count them, I grant you the most common form is separate typedefs.
This particular example was cut-and-pasted from pixman.c but can easily
be changed.


+static pixman_bool_t
+compute_transformed_extents (pixman_transform_t *transform,
+ const pixman_box32_t *extents,
+ box_48_16_t *transformed)


CODING_STYLE is asking for alignment here for the arg names.


This was also a cut-and-paste, but good point.


If there is no transform, why not return the original extents?
These have been reduced by a half unit in all four directions.


It's more obviously relevant in the bilinear scaling case, but a pixel's
colour is considered to apply to the midpoint of each pixel. Thus an
image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ...
(N-1.5), (N-0.5). Therefore, when you're working out which source pixels
are required to generate a range of output pixels, it's 0.5 and (N-0.5)
that you need to feed through any transformation matrix. If the no
transform case were special-cased, you'd then need to special-case it
again in the calling function because the 0.5 offset applies (albeit at
different scaling factors) to both source and destination images, so the
caller will be adding the 0.5 pixel border back on again.

In affine-bench, I calculate image sizes based on bilinear scaling for
simplicity, since a source or mask image big enough to satisfy COVER_CLIP
for bilinear scaling will also do so for nearest scaling.

[compute_transformed_extents]

+tx1 = ty1 = INT64_MAX;
+tx2 = ty2 = INT64_MIN;
+
+for (i = 0; i < 4; ++i)
+{
+pixman_fixed_48_16_t tx, ty;
+pixman_vector_t v;
+
+v.vector[0] = (i & 0x01)? x1 : x2;
+v.vector[1] = (i & 0x02)? y1 : y2;
+v.vector[2] = pixman_fixed_1;
+
+if (!pixman_transform_point (transform, &v))
+return FALSE;
+
+tx = (pixman_fixed_48_16_t)v.vector[0];
+ty = (pixman_fixed_48_16_t)v.vector[1];


This works only for affine. This is called affine-bench, so that is
natural, yet might be nice to add an assert or something to guarantee
no-one will copy this code to something else that uses projective
transforms.


I might be missing something, but it looks to me like
pixman_transform_point (through its subroutine
pixman_transform_point_31_16) does handle projective transforms, and what
this code segment is doing is iterating over the four corners of the
destination rectangle and finding the maximum and minimum source x/y by
considering the transformed position of each corner independently. Surely
the whole set of destination pixels, once passed through even a
projective matrix, must still lie within this bounding box?

[create_image]

Too long line, needs breaking up.


OK.

[various places]

Add empty line.


OK.


+*bits = aligned_malloc (4096, stride * height);


Is there a #define to get the 4096 from? I assume it's page size? What
if the platform has big pages?


I assume the same thing. It's quite a common page size as I understand
it. I know it's not much excuse, but 

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-03-17 Thread Pekka Paalanen
On Tue,  3 Mar 2015 15:24:20 +
Ben Avison  wrote:

> ---
>  test/Makefile.sources |1 +
>  test/affine-bench.c   |  330 
> +
>  2 files changed, 331 insertions(+), 0 deletions(-)
>  create mode 100644 test/affine-bench.c
> 
> diff --git a/test/Makefile.sources b/test/Makefile.sources
> index c20c34b..8b0e855 100644
> --- a/test/Makefile.sources
> +++ b/test/Makefile.sources
> @@ -37,6 +37,7 @@ OTHERPROGRAMS = \
>   radial-perf-test\
>  check-formats   \
>   scaling-bench   \
> + affine-bench\
>   $(NULL)

affine-bench should be added to .gitignore too.

>  
>  # Utility functions
> diff --git a/test/affine-bench.c b/test/affine-bench.c
> new file mode 100644
> index 000..55478d6
> --- /dev/null
> +++ b/test/affine-bench.c
> @@ -0,0 +1,330 @@
> +/*
> + * Copyright © 2014 RISC OS Open Ltd
> + *
> + * Permission to use, copy, modify, distribute, and sell this software and 
> its
> + * documentation for any purpose is hereby granted without fee, provided that
> + * the above copyright notice appear in all copies and that both that
> + * copyright notice and this permission notice appear in supporting
> + * documentation, and that the name of the copyright holders not be used in
> + * advertising or publicity pertaining to distribution of the software 
> without
> + * specific, written prior permission.  The copyright holders make no
> + * representations about the suitability of this software for any purpose.  
> It
> + * is provided "as is" without express or implied warranty.
> + *
> + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
> + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
> + * SOFTWARE.
> + *
> + * Author:  Ben Avison (bavi...@riscosopen.org)
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "utils.h"
> +
> +#ifdef HAVE_GETTIMEOFDAY
> +#include 
> +#else
> +#include 
> +#endif
> +
> +#define WIDTH  1920
> +#define HEIGHT 1080
> +#define L2CACHE_SIZE (128 * 1024)

Hi,

I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it
come from? If it's really an arch/platform-independent constant, maybe
some test header could have it?

> +
> +typedef struct
> +{
> +pixman_fixed_48_16_tx1;
> +pixman_fixed_48_16_ty1;
> +pixman_fixed_48_16_tx2;
> +pixman_fixed_48_16_ty2;
> +} box_48_16_t;

Looks like Pixman style is using separate statements for struct
definition and typedefs.

> +
> +static pixman_bool_t
> +compute_transformed_extents (pixman_transform_t *transform,
> + const pixman_box32_t *extents,
> + box_48_16_t *transformed)

CODING_STYLE is asking for alignment here for the arg names.

> +{
> +pixman_fixed_48_16_t tx1, ty1, tx2, ty2;
> +pixman_fixed_t x1, y1, x2, y2;
> +int i;
> +
> +x1 = pixman_int_to_fixed (extents->x1) + pixman_fixed_1 / 2;
> +y1 = pixman_int_to_fixed (extents->y1) + pixman_fixed_1 / 2;
> +x2 = pixman_int_to_fixed (extents->x2) - pixman_fixed_1 / 2;
> +y2 = pixman_int_to_fixed (extents->y2) - pixman_fixed_1 / 2;
> +
> +if (!transform)
> +{
> +transformed->x1 = x1;
> +transformed->y1 = y1;
> +transformed->x2 = x2;
> +transformed->y2 = y2;

If there is no transform, why not return the original extents?
These have been reduced by a half unit in all four directions.

> +
> +return TRUE;
> +}
> +
> +tx1 = ty1 = INT64_MAX;
> +tx2 = ty2 = INT64_MIN;
> +
> +for (i = 0; i < 4; ++i)
> +{
> +pixman_fixed_48_16_t tx, ty;
> +pixman_vector_t v;
> +
> +v.vector[0] = (i & 0x01)? x1 : x2;
> +v.vector[1] = (i & 0x02)? y1 : y2;
> +v.vector[2] = pixman_fixed_1;
> +
> +if (!pixman_transform_point (transform, &v))
> +return FALSE;
> +
> +tx = (pixman_fixed_48_16_t)v.vector[0];
> +ty = (pixman_fixed_48_16_t)v.vector[1];

This works only for affine. This is called affine-bench, so that is
natural, yet might be nice to add an assert or something to guarantee
no-one will copy this code to something else that uses projective
transforms.

Or, maybe having "affine" as part of the function name?

> +
> +if (tx < tx1)
> +tx1 = tx;
> +if (ty < ty1)
> +ty1 = ty;
> +if (tx > tx2)
> +tx2 = tx;
> +if (ty > ty2)
> +ty2 = ty;
> +}
> +
> +transformed->x1 = tx1;
> +transformed->y1 = ty1;
> +t