Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Thu, 23 Apr 2015 17:12:59 +0100, I wrote: I imagine most of the time, you'll have a source image of fixed size, and you'll either have a target destination size (in which case your task is to calculate the transform matrix coefficients) or you'll have a target scaling factor (in which case your task is to work out the destination size that can be achieved without running off the sides of the source image). Just to add to this, an example has come to mind that demonstrates why higher level software should really know about the pixman_fixed_e offset for nearest-neighbour scaling in order to make best use of Pixman. Assume we have a source image that's 101 pixels wide, and we know it has to be plotted at a scale factor of 50%. How many destination pixels do we tell Pixman to plot? In reality, given a transform matrix whose diagonal is (0.5, 0.5, 1), Pixman is capable out picking out source pixels with index 0, 2, 4, 6 ... 100 - that's 51 pixels - whilst still using a COVER_CLIP fast path. If it weren't for the pixman_fixed_e offset, it would pick out source pixels 1, 3, 5, 7 ... 99 instead - that's 50 pixels. Now, if the caller asks for 50 pixels and Pixman uses the offset, then we get to use the COVER_CLIP fast path, but we've trimmed two pixels off the right and none off the left. But if the caller asks for 51 pixels and Pixman doesn't use the offset, then Pixman realises it needs a value for out-of-bounds pixel index 101 so won't use the COVER_CLIP fast path. So, for the best combination of image quality and speed, the details of Pixman's rounding algorithm *can* be significant. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Thu, 23 Apr 2015 12:46:56 +0100, Pekka Paalanen wrote: On Wed, 15 Apr 2015 17:03:43 +0100 "Ben Avison" wrote: For nearest-neighbour scaling, the nth pixel of a source or mask image is considered to cover the space (n, n+1]. Yes, that's inclusive at the upper end and exclusive at the lower end, not what you might expect (and this is the reason for adjustments by pixman_fixed_e in various places). Aha. Slightly surprising. Is this only internal to Pixman, or does it show outside too? That is, something that users should take into account, needed for the "exact alignment" to hit the COVER_CLIP? I've never been really sure if it was intended to work that way, but it's the way the code functions as written. If you change it, you do hit some subtle differences in the test suite, but I wonder if it's doing what the author intended. I say that because with most "obvious" scaling factors, assuming you don't use a translation in the transform matrix (i.e. you scale from pixel edges) then you don't actually hit the borderline position very often, if ever, so it's strange that they went to any effort to do anything other than what works out to be the simplest thing to do. I think some ASCII art might help. Consider a 4x enlargement. Key: | integer X in source space - integer Y in source space S centre of source pixel (half-integer X and Y) D point at which source is sampled (centre of destination pixel) +---+---+ | D D D D | D D D D | | | | | D D D D | D D D D | | S | S | | D D D D | D D D D | | | | | D D D D | D D D D | +---+---+ | D D D D | D D D D | | | | | D D D D | D D D D | | S | S | | D D D D | D D D D | | | | | D D D D | D D D D | +---+---+ You can see that we never actually sample any of the boundary lines, so the adjustment by pixman_fixed_e has no effect in this case. The same is true of any integer enlargement factor, at least until you start to get close to 65536x enlargement. If you're cunning, you might ask about half-integer enlargement factors. Here's what a 2.5x enlargement looks like: +---+---+ | | | | D D D D D | | | | | | | | S | S | | D D D D D | | | | | | | | | | +---D---D---D---D---D---+ | | | | | | | | | | D D D D D | | S | S | | | | | | | | D D D D D | | | | +---+---+ But this neglects the fact that this requires a value of 2/5 in the transform matrix. This, like the reciprocals of all other half-integer values (2/3, 2/7, 2/9, 2/11 etc) are subject to rounding error when represented as binary, so you won't actually end up hitting the borderline regularly. The only case I can think of where you would routinely sample at the edges is for a power-of-two reduction. For example, this is what a 0.25 scale factor looks like: +---+---+---+---+---+---+---+---+ | S | S | S | S | S | S | S | S | +---+---+---+---+---+---+---+---+ | S | S | S | S | S | S | S | S | +---+---D---+---+---+---D---+---+ | S | S | S | S | S | S | S | S | +---+---+---+---+---+---+---+---+ | S | S | S | S | S | S | S | S | +---+---+---+---+---+---+---+---+ | S | S | S | S | S | S | S | S | +---+---+---+---+---+---+---+---+ | S | S | S | S | S | S | S | S | +---+---D---+---+---+---D---+---+ | S | S | S | S | S | S | S | S | +---+---+---+---+---+---+---+---+ | S | S | S | S | S | S | S | S | +---+---+---+---+---+---+---+---+ The (n, n+1] rule means that you would sample the 2nd and 6th source pixels, rather than the 3rd and 7th if you used a [n, n+1) rule. If I had to hazard a guess, I'd say that the authors were looking solely at the 0.5 scale factor case and were surprised to find they got the 2nd, 4th, 6th source pixels out rather than the 1st, 3rd, 5th and added the pixman_fixed_e to "correct" this rather than looking at the wider picture. But that's just a guess. The balance seems to shift the other way if you scale from pixel centres rather than edges, and it becomes easier to hit the edges with
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Wed, 15 Apr 2015 17:03:43 +0100 "Ben Avison" wrote: > On Wed, 15 Apr 2015 10:39:54 +0100, > Pekka Paalanen wrote: > > > On Tue, 14 Apr 2015 19:00:58 +0100 > > "Ben Avison" wrote: > > > >> Assume the reasonable case that you want to plot the whole of > >> an image of size x,y at a size m,n. You need to set the diagonals of > >> the transform to > >> > >> floor(pixman_fixed_1*(x-1)/(m-1)) > >> floor(pixman_fixed_1*(y-1)/(n-1)) > >> 1 > >> > >> and then solve for > >> > >> / 0.5 \ / 0.5 \ > >> T . | 0.5 | = | 0.5 | > >> \ 1 / \ 1 / > >> > >> to find the translation coefficients that will ensure the top-left > >> source pixel aligns exactly to the top-left destination pixel. > > > > Urgh, I don't think we account for *that* in e.g. Weston's Pixman > > renderer. I could bet Weston follows the GL convention of defining the > > extents of an image region to be the boundaries of the image, not the > > pixel centers which would leave a half pixel margin outside of the > > region. I mean, Weston works as if you are transforming geometric > > primitives like triangles, and rasterization happens later, while it > > still uses pixel centers at .5 coordinates, meaning that a single pixel > > covers the X-coordinates [0.0, 1.0). > > > > Are you saying that this idea doesn't work with Pixman? > > I had a recollection of Pixman's geometric model, but given that there's > no documentation I felt the need to go back and check the source code, as > it can be very easy to get this sort of thing subtly wrong: > > Pixman's scaling is such that pixel centres are similarly positioned > irrespective of whether you're doing nearest-neighbour or bilinear > scaling. > > For nearest-neighbour scaling, the nth pixel of a source or mask image is > considered to cover the space (n, n+1]. Yes, that's inclusive at the > upper end and exclusive at the lower end, not what you might expect (and > this is the reason for adjustments by pixman_fixed_e in various places). Aha. Slightly surprising. Is this only internal to Pixman, or does it show outside too? That is, something that users should take into account, needed for the "exact alignment" to hit the COVER_CLIP? I'm guessing yes, because you do that in affine-bench, reversed... am I on the right track here? Sorry if I'm a bit slow here. > For bilinear scaling, the contribution of the nth pixel scales up > linearly from 0 at n-0.5 to 1 at n+0.5 then back down to 0 at n+1.5, so > its overall range of influence is (n-0.5, n+1.5). Ok, that is as I'd expect, I think. > Since the centre coordinate of each destination pixel is what is fed > through the transformation matrix, you can see that under the identity > transform, there is always a 1:1 mapping from a single source pixel to a > single destination pixel, irrespective of whether you're nominally doing > nearest or bilinear scaling. This enables Pixman to select unscaled fast > paths just by examining the transform. Right. > If you scale from the outer extents (in other words, use scale factors of > x/m, y/n rather than (x-1)/(m-1), (y-1)/(n-1), and align the top-left of > the first source pixel with the top-left of the first destination pixel) > then that's fine, as long as: > > a) you're using nearest-neighbour scaling > or > b) you're using bilinear scaling and a scale factor > 1 (i.e. reduction > rather than enlargement) > > otherwise Pixman will decide that it needs to use a small contribution > from a single-pixel border just outside the bounds of the source image, > for which it then needs to start following the repeat-type rules (pad/ > tile/reflect/none). Ok, yeah. Since Weston is working with geometry rather than pixels, I think that works just fine... except for the left or top edge, which like you said for the pixel occupancy rules, would be accounted to an outside pixel. Sounds like should be removing pixman_fixed_e in Weston from the source area left and top or something to hit COVER_CLIP. Well, that's really off-topic; just for my personal education. Thanks, pq ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Wed, 15 Apr 2015 10:39:54 +0100, Pekka Paalanen wrote: On Tue, 14 Apr 2015 19:00:58 +0100 "Ben Avison" wrote: Assume the reasonable case that you want to plot the whole of an image of size x,y at a size m,n. You need to set the diagonals of the transform to floor(pixman_fixed_1*(x-1)/(m-1)) floor(pixman_fixed_1*(y-1)/(n-1)) 1 and then solve for / 0.5 \ / 0.5 \ T . | 0.5 | = | 0.5 | \ 1 / \ 1 / to find the translation coefficients that will ensure the top-left source pixel aligns exactly to the top-left destination pixel. Urgh, I don't think we account for *that* in e.g. Weston's Pixman renderer. I could bet Weston follows the GL convention of defining the extents of an image region to be the boundaries of the image, not the pixel centers which would leave a half pixel margin outside of the region. I mean, Weston works as if you are transforming geometric primitives like triangles, and rasterization happens later, while it still uses pixel centers at .5 coordinates, meaning that a single pixel covers the X-coordinates [0.0, 1.0). Are you saying that this idea doesn't work with Pixman? I had a recollection of Pixman's geometric model, but given that there's no documentation I felt the need to go back and check the source code, as it can be very easy to get this sort of thing subtly wrong: Pixman's scaling is such that pixel centres are similarly positioned irrespective of whether you're doing nearest-neighbour or bilinear scaling. For nearest-neighbour scaling, the nth pixel of a source or mask image is considered to cover the space (n, n+1]. Yes, that's inclusive at the upper end and exclusive at the lower end, not what you might expect (and this is the reason for adjustments by pixman_fixed_e in various places). For bilinear scaling, the contribution of the nth pixel scales up linearly from 0 at n-0.5 to 1 at n+0.5 then back down to 0 at n+1.5, so its overall range of influence is (n-0.5, n+1.5). Since the centre coordinate of each destination pixel is what is fed through the transformation matrix, you can see that under the identity transform, there is always a 1:1 mapping from a single source pixel to a single destination pixel, irrespective of whether you're nominally doing nearest or bilinear scaling. This enables Pixman to select unscaled fast paths just by examining the transform. If you scale from the outer extents (in other words, use scale factors of x/m, y/n rather than (x-1)/(m-1), (y-1)/(n-1), and align the top-left of the first source pixel with the top-left of the first destination pixel) then that's fine, as long as: a) you're using nearest-neighbour scaling or b) you're using bilinear scaling and a scale factor > 1 (i.e. reduction rather than enlargement) otherwise Pixman will decide that it needs to use a small contribution from a single-pixel border just outside the bounds of the source image, for which it then needs to start following the repeat-type rules (pad/ tile/reflect/none). Hmm, sure, Pixman needs to transform pixel centers internally, since it rasterizes on the destination image... so what compute_transformed_extents() does is that it assumes the input is geometry, converts corner pixels into pixel centers for rasterization, and returns the bounding box of the transformed pixel centers. Then that is used to calculate what pixels would be accessed in the source image. Did I get that right? Yes. Its only purpose is to determine the (inclusive) first and last source pixels required, so that the caller can then check to see if they are within bounds. Btw. are you sure there cannot be rounding in the hand-written fast paths? If some assembly path computes an approximation of source coordinates, could it not hit the "wrong" pixel? Zero coefficient or not. I'm pretty sure, yes - at least for affine transforms. The transform coefficients, inputs and outputs are all x.16 fixed point, with a rule for handling rounding for the multiplications (round to nearest with halfs rounding to +infinity) that's enforced in platform-neutral code like pixman_transform_point_31_16(). As a fast path steps along a destination row, there are no further multiplications, it just adds the transform coefficients from the first column to the running total, and no rounding is therefore needed. If any fast path deviated from this, it would fail the test suite, which requires binary-identical output. Actually, I wonder if this rounding issue is the reason why projective transforms aren't more widely featured in the tests? Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 14 Apr 2015 19:00:58 +0100 "Ben Avison" wrote: > On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen > wrote: > > So the matrices are indexed as matrix[row][column] in Pixman? > > That's my understanding, yes. > > > A short'ish comment somewhere to say why you are doing this offsetting > > would be nice, and that the offsetting is the reason to allocate a > > margin. > > Since you've been reworking the patches anyway, do you have enough > information to add the comments yourself or do you want me to do them? This and the empty lines I can also add them myself I think, if there is a need for me to revise the patch. > >> One simple way round this is to apply a 0.5 pixel translation in the > >> transformation matrix, so that the pattern becomes: > >> > >> d[0] =1.00 * s[0] > >> d[1] =0.50 * s[0] + 0.50 * s[1] > >> d[2] = 1.00 * s[1] > >> d[3] = 0.50 * s[1] + 0.50 * s[2] > >> > >> but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't > >> see why the fudge factor is needed at all, at least not in the affine > >> case; it's described as being to compensate for rounding errors, but > >> there is no rounding involved in fixed-point affine transforms. > > > > Maybe the rounding refers to the pixman_fixed_to_int() calls? I could > > imagine it is to guarantee that an xmin=0.5 gets really rounded to 0, > > and xmax=0.5 gets rounded to 1. > > There's a slightly better worked example in the later patch I was > thinking of: > > http://lists.freedesktop.org/archives/pixman/2014-September/003380.html > > As I say, we're getting a bit ahead of ourselves here, as there were > about 30 patches between the ones we're reworking at the moment and that > one. Søren did give some comments on it at the end of his review here: > > http://lists.freedesktop.org/archives/pixman/2014-October/003457.html > > which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond > the bitmap bounds if a fast path happened to over-read pixels and then > multiply them by a 0 filter coefficient. I think we can probably cater > for that better by checking whether a bitmap starts or ends at a page > boundary, and whether we're reading the first and last pixels of the > image if it does. I would hesitate to add logic along the lines of "reading beyond array bounds is ok if it doesn't cross page boundaries". It might be true, but it could produce hard-to-understand false-positives in Valgrind, for example, which tracks every byte individually. It's also fairly surprising and unexpected condition for choosing different code paths, IMHO, since there is no hardware reason to do it. But, I would be happy to enhance the test suite to test for bad accesses better. The test suite is something I can well work on. > > To be honest, both cases you describe sound strange to me. Surely if I > > use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect > > d[0] = 0.5 * s[0] + 0.5 * s[1] > > when assuming the box filter (if I got my terminology right)... > > You're forgetting it's pixel-centres that are fed through the transform. > To be honest, I think it's already quite a headache for an application to > work out how to set up the transform in order to hit a "cover" scaled > fast path. Assume the reasonable case that you want to plot the whole of > an image of size x,y at a size m,n. You need to set the diagonals of the > transform to > > floor(pixman_fixed_1*(x-1)/(m-1)) > floor(pixman_fixed_1*(y-1)/(n-1)) > 1 > > and then solve for > > / 0.5 \ / 0.5 \ > T . | 0.5 | = | 0.5 | > \ 1 / \ 1 / > > to find the translation coefficients that will ensure the top-left source > pixel aligns exactly to the top-left destination pixel. Urgh, I don't think we account for *that* in e.g. Weston's Pixman renderer. I could bet Weston follows the GL convention of defining the extents of an image region to be the boundaries of the image, not the pixel centers which would leave a half pixel margin outside of the region. I mean, Weston works as if you are transforming geometric primitives like triangles, and rasterization happens later, while it still uses pixel centers at .5 coordinates, meaning that a single pixel covers the X-coordinates [0.0, 1.0). Are you saying that this idea doesn't work with Pixman? Hmm, sure, Pixman needs to transform pixel centers internally, since it rasterizes on the destination image... so what compute_transformed_extents() does is that it assumes the input is geometry, converts corner pixels into pixel centers for rasterization, and returns the bounding box of the transformed pixel centers. Then that is used to calculate what pixels would be accessed in the source image. Did I get that right? > To then require that the caller also knows that you need to nudge things > along by 8*pixman_fixed_e as well feels like it's requiring too much > knowledge of Pixman's internals to me. I did
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen wrote: So the matrices are indexed as matrix[row][column] in Pixman? That's my understanding, yes. A short'ish comment somewhere to say why you are doing this offsetting would be nice, and that the offsetting is the reason to allocate a margin. Since you've been reworking the patches anyway, do you have enough information to add the comments yourself or do you want me to do them? One simple way round this is to apply a 0.5 pixel translation in the transformation matrix, so that the pattern becomes: d[0] =1.00 * s[0] d[1] =0.50 * s[0] + 0.50 * s[1] d[2] = 1.00 * s[1] d[3] = 0.50 * s[1] + 0.50 * s[2] but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't see why the fudge factor is needed at all, at least not in the affine case; it's described as being to compensate for rounding errors, but there is no rounding involved in fixed-point affine transforms. Maybe the rounding refers to the pixman_fixed_to_int() calls? I could imagine it is to guarantee that an xmin=0.5 gets really rounded to 0, and xmax=0.5 gets rounded to 1. There's a slightly better worked example in the later patch I was thinking of: http://lists.freedesktop.org/archives/pixman/2014-September/003380.html As I say, we're getting a bit ahead of ourselves here, as there were about 30 patches between the ones we're reworking at the moment and that one. Søren did give some comments on it at the end of his review here: http://lists.freedesktop.org/archives/pixman/2014-October/003457.html which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond the bitmap bounds if a fast path happened to over-read pixels and then multiply them by a 0 filter coefficient. I think we can probably cater for that better by checking whether a bitmap starts or ends at a page boundary, and whether we're reading the first and last pixels of the image if it does. To be honest, both cases you describe sound strange to me. Surely if I use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect d[0] = 0.5 * s[0] + 0.5 * s[1] when assuming the box filter (if I got my terminology right)... You're forgetting it's pixel-centres that are fed through the transform. To be honest, I think it's already quite a headache for an application to work out how to set up the transform in order to hit a "cover" scaled fast path. Assume the reasonable case that you want to plot the whole of an image of size x,y at a size m,n. You need to set the diagonals of the transform to floor(pixman_fixed_1*(x-1)/(m-1)) floor(pixman_fixed_1*(y-1)/(n-1)) 1 and then solve for / 0.5 \ / 0.5 \ T . | 0.5 | = | 0.5 | \ 1 / \ 1 / to find the translation coefficients that will ensure the top-left source pixel aligns exactly to the top-left destination pixel. To then require that the caller also knows that you need to nudge things along by 8*pixman_fixed_e as well feels like it's requiring too much knowledge of Pixman's internals to me. I didn't really like putting it in affine-bench to begin with for this reason, but it doesn't work without it. So I made do with removing it as quickly as I could - the original intention was for it to be in the same patch series, but obviously the patch series has grown too large and unwieldy for that the be the case any more! When I've seen timing breakdowns of which Pixman operations are being hit by applications, I've seen far more scaled plots with repeat type PAD than I'd really expect. My suspicions are that the bulk of these are down to how difficult it is for applications to set up the transform for maximum efficiency. I think my hope was that since we're processing images of size 1920 * 1080 pixels, you wouldn't have any particular pixels persisting in the cache from the previous iteration (unless perhaps if you're testing a high-factor enlargement on a platform that has caches that don't allocate on write). Right, so is there any need to flush_cache() at all? Possibly not. I was taking my cue from lowlevel-blt-bench again, where it flushes the cache between each type of test. And looking at the implementation of flush_cache() again now, I note that due to the action of allocate-on-write and physically mapped caches, it won't completely flush most caches, only one page-size worth! Perhaps just delete it then. >> +bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 500, &n, &t1, pixman_image_composite_wrapper); >> +bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, n, UINT32_MAX, NULL, &t2, pixman_image_composite_empty); [...] I think that part I understood. My comment was about having a {} block without any flow control statements for it. You are just using it to scope some variables, which suggests that the block should actually be a separate function. Oh right. Well, I think it was m
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Wed, 08 Apr 2015 02:27:48 +0100 "Ben Avison" wrote: > On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen > wrote: > > If there is no transform, why not return the original extents? > > These have been reduced by a half unit in all four directions. > > It's more obviously relevant in the bilinear scaling case, but a pixel's > colour is considered to apply to the midpoint of each pixel. Thus an > image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ... > (N-1.5), (N-0.5). Therefore, when you're working out which source pixels > are required to generate a range of output pixels, it's 0.5 and (N-0.5) > that you need to feed through any transformation matrix. If the no > transform case were special-cased, you'd then need to special-case it > again in the calling function because the 0.5 offset applies (albeit at > different scaling factors) to both source and destination images, so the > caller will be adding the 0.5 pixel border back on again. Ok, so the extents going in to compute_transformed_extents() have a different meaning than the ones coming out. The ones going in define the edges of the image, while the ones coming out are reduced to the corner pixel centers. Which means that it is incorrect to, say, compute extents for two sequential transforms one by one but you must combine the transforms first and then call compute_transformed_extents() just once. I was just surprised by this convention. > In affine-bench, I calculate image sizes based on bilinear scaling for > simplicity, since a source or mask image big enough to satisfy COVER_CLIP > for bilinear scaling will also do so for nearest scaling. > > [compute_transformed_extents] > >> +tx1 = ty1 = INT64_MAX; > >> +tx2 = ty2 = INT64_MIN; > >> + > >> +for (i = 0; i < 4; ++i) > >> +{ > >> +pixman_fixed_48_16_t tx, ty; > >> +pixman_vector_t v; > >> + > >> +v.vector[0] = (i & 0x01)? x1 : x2; > >> +v.vector[1] = (i & 0x02)? y1 : y2; > >> +v.vector[2] = pixman_fixed_1; > >> + > >> +if (!pixman_transform_point (transform, &v)) > >> +return FALSE; > >> + > >> +tx = (pixman_fixed_48_16_t)v.vector[0]; > >> +ty = (pixman_fixed_48_16_t)v.vector[1]; > > > > This works only for affine. This is called affine-bench, so that is > > natural, yet might be nice to add an assert or something to guarantee > > no-one will copy this code to something else that uses projective > > transforms. > > I might be missing something, but it looks to me like > pixman_transform_point (through its subroutine > pixman_transform_point_31_16) does handle projective transforms, and what > this code segment is doing is iterating over the four corners of the > destination rectangle and finding the maximum and minimum source x/y by > considering the transformed position of each corner independently. Surely > the whole set of destination pixels, once passed through even a > projective matrix, must still lie within this bounding box? Ahh, another case where I had wrong assumptions. pixman_transform_point() indeed does guarantee, that v.vector[2] is 1.0. I'm used to assume it's not guaranteed because the guarantee leads to a division by zero for points at infinity. But Pixman cannot deal with points at infinity anyway, so it sort of makes sense in relation to pixel images. > [flush_cache] > >> +volatile const char *x = clean_space; > [...] > >> +(void) *x; > > > > Are you sure this actually does the read? That the compiler is not > > allowed to discard the read? That's something I'm never sure of. > > That's the point of the volatile qualifier - the compiler isn't allowed > to throw away accesses to an object which is declared volatile, it has to > perform exactly as many accesses as you ask for and in the same order. > From the C standard: > > "A volatile declaration may be used to describe an object > corresponding to a memory-mapped input/output port or an object > accessed by an asynchronously interrupting function. Actions on > objects so declared shall not be 'optimized out' by an > implementation or reordered except as permitted by the rules for > evaluating expressions." > > What you do need to be careful of these days is that this says nothing > about the number of or ordering of accesses as viewed by any part of the > system beyond the L1 cache, including any other cores - that's where > memory barriers come in. But that's not relevant for the purposes of > cleaning the cache. Very good. > >> +x += 32; > > > > Why 32? Is this CACHELINE_LENGTH from lowlevel-blt-bench.c? > > Yes, that's the origin. You might argue for reading the cacheline length > at runtime, as with cache and page size, but it's less critical here. If > your cacheline length was actually 64 bytes, then reading every 32nd byte > would still pull in every cacheline. Reading every single byte would have > the same effect but be a bit of a waste of time; t
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen wrote: affine-bench should be added to .gitignore too. OK. +#define L2CACHE_SIZE (128 * 1024) I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it come from? If it's really an arch/platform-independent constant, maybe some test header could have it? As you will no doubt have noticed, affine-bench is heavily inspired by lowlevel-blt-bench. Admittedly it's a bit of a cheat to assume one L2 cache size, but in lowlevel-blt-bench, it's only used for sizing the images used for the L2 test. For those purposes, it's only important that it's a number big enough that the images are flushed out of the L1 cache but small enough that they aren't flushed out to memory, so as long as it's bigger than the largest L1 cache you're likely to encounter and no smaller than the smallest L2 cache, then it'll do, and saves lots of messy OS-specific code to determine the cache sizes. Admittedly, now I look back on the use of the same constant in affine-bench, I've taken the point of view that it's too difficult in the presence of significant transformation coefficients to calculate image sizes that sit within a particular cache, so I'm only bothering to test the memory-constrained case. To do that, I'm using L2CACHE_SIZE to size an array used to flush the L2 cache - but that needs the *largest* typical L2 cache size, not the smallest, so 128K won't cut it here. While I have easy access to the cache sizes on CPUs that I'm actively targeting, a general-purpose benchmarker should probably be suitable for all architectures. Should we read it from the OS somehow? (Anyone know how best to do that for, say, Linux, Windows, OS X and iOS, in that case?) Looks like Pixman style is using separate statements for struct definition and typedefs. Not exclusively - I think the bits I've looked at usually do, but now that I count them, I grant you the most common form is separate typedefs. This particular example was cut-and-pasted from pixman.c but can easily be changed. +static pixman_bool_t +compute_transformed_extents (pixman_transform_t *transform, + const pixman_box32_t *extents, + box_48_16_t *transformed) CODING_STYLE is asking for alignment here for the arg names. This was also a cut-and-paste, but good point. If there is no transform, why not return the original extents? These have been reduced by a half unit in all four directions. It's more obviously relevant in the bilinear scaling case, but a pixel's colour is considered to apply to the midpoint of each pixel. Thus an image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ... (N-1.5), (N-0.5). Therefore, when you're working out which source pixels are required to generate a range of output pixels, it's 0.5 and (N-0.5) that you need to feed through any transformation matrix. If the no transform case were special-cased, you'd then need to special-case it again in the calling function because the 0.5 offset applies (albeit at different scaling factors) to both source and destination images, so the caller will be adding the 0.5 pixel border back on again. In affine-bench, I calculate image sizes based on bilinear scaling for simplicity, since a source or mask image big enough to satisfy COVER_CLIP for bilinear scaling will also do so for nearest scaling. [compute_transformed_extents] +tx1 = ty1 = INT64_MAX; +tx2 = ty2 = INT64_MIN; + +for (i = 0; i < 4; ++i) +{ +pixman_fixed_48_16_t tx, ty; +pixman_vector_t v; + +v.vector[0] = (i & 0x01)? x1 : x2; +v.vector[1] = (i & 0x02)? y1 : y2; +v.vector[2] = pixman_fixed_1; + +if (!pixman_transform_point (transform, &v)) +return FALSE; + +tx = (pixman_fixed_48_16_t)v.vector[0]; +ty = (pixman_fixed_48_16_t)v.vector[1]; This works only for affine. This is called affine-bench, so that is natural, yet might be nice to add an assert or something to guarantee no-one will copy this code to something else that uses projective transforms. I might be missing something, but it looks to me like pixman_transform_point (through its subroutine pixman_transform_point_31_16) does handle projective transforms, and what this code segment is doing is iterating over the four corners of the destination rectangle and finding the maximum and minimum source x/y by considering the transformed position of each corner independently. Surely the whole set of destination pixels, once passed through even a projective matrix, must still lie within this bounding box? [create_image] Too long line, needs breaking up. OK. [various places] Add empty line. OK. +*bits = aligned_malloc (4096, stride * height); Is there a #define to get the 4096 from? I assume it's page size? What if the platform has big pages? I assume the same thing. It's quite a common page size as I understand it. I know it's not much excuse, but
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 3 Mar 2015 15:24:20 + Ben Avison wrote: > --- > test/Makefile.sources |1 + > test/affine-bench.c | 330 > + > 2 files changed, 331 insertions(+), 0 deletions(-) > create mode 100644 test/affine-bench.c > > diff --git a/test/Makefile.sources b/test/Makefile.sources > index c20c34b..8b0e855 100644 > --- a/test/Makefile.sources > +++ b/test/Makefile.sources > @@ -37,6 +37,7 @@ OTHERPROGRAMS = \ > radial-perf-test\ > check-formats \ > scaling-bench \ > + affine-bench\ > $(NULL) affine-bench should be added to .gitignore too. > > # Utility functions > diff --git a/test/affine-bench.c b/test/affine-bench.c > new file mode 100644 > index 000..55478d6 > --- /dev/null > +++ b/test/affine-bench.c > @@ -0,0 +1,330 @@ > +/* > + * Copyright © 2014 RISC OS Open Ltd > + * > + * Permission to use, copy, modify, distribute, and sell this software and > its > + * documentation for any purpose is hereby granted without fee, provided that > + * the above copyright notice appear in all copies and that both that > + * copyright notice and this permission notice appear in supporting > + * documentation, and that the name of the copyright holders not be used in > + * advertising or publicity pertaining to distribution of the software > without > + * specific, written prior permission. The copyright holders make no > + * representations about the suitability of this software for any purpose. > It > + * is provided "as is" without express or implied warranty. > + * > + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS > + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND > + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY > + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN > + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING > + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS > + * SOFTWARE. > + * > + * Author: Ben Avison (bavi...@riscosopen.org) > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include "utils.h" > + > +#ifdef HAVE_GETTIMEOFDAY > +#include > +#else > +#include > +#endif > + > +#define WIDTH 1920 > +#define HEIGHT 1080 > +#define L2CACHE_SIZE (128 * 1024) Hi, I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it come from? If it's really an arch/platform-independent constant, maybe some test header could have it? > + > +typedef struct > +{ > +pixman_fixed_48_16_tx1; > +pixman_fixed_48_16_ty1; > +pixman_fixed_48_16_tx2; > +pixman_fixed_48_16_ty2; > +} box_48_16_t; Looks like Pixman style is using separate statements for struct definition and typedefs. > + > +static pixman_bool_t > +compute_transformed_extents (pixman_transform_t *transform, > + const pixman_box32_t *extents, > + box_48_16_t *transformed) CODING_STYLE is asking for alignment here for the arg names. > +{ > +pixman_fixed_48_16_t tx1, ty1, tx2, ty2; > +pixman_fixed_t x1, y1, x2, y2; > +int i; > + > +x1 = pixman_int_to_fixed (extents->x1) + pixman_fixed_1 / 2; > +y1 = pixman_int_to_fixed (extents->y1) + pixman_fixed_1 / 2; > +x2 = pixman_int_to_fixed (extents->x2) - pixman_fixed_1 / 2; > +y2 = pixman_int_to_fixed (extents->y2) - pixman_fixed_1 / 2; > + > +if (!transform) > +{ > +transformed->x1 = x1; > +transformed->y1 = y1; > +transformed->x2 = x2; > +transformed->y2 = y2; If there is no transform, why not return the original extents? These have been reduced by a half unit in all four directions. > + > +return TRUE; > +} > + > +tx1 = ty1 = INT64_MAX; > +tx2 = ty2 = INT64_MIN; > + > +for (i = 0; i < 4; ++i) > +{ > +pixman_fixed_48_16_t tx, ty; > +pixman_vector_t v; > + > +v.vector[0] = (i & 0x01)? x1 : x2; > +v.vector[1] = (i & 0x02)? y1 : y2; > +v.vector[2] = pixman_fixed_1; > + > +if (!pixman_transform_point (transform, &v)) > +return FALSE; > + > +tx = (pixman_fixed_48_16_t)v.vector[0]; > +ty = (pixman_fixed_48_16_t)v.vector[1]; This works only for affine. This is called affine-bench, so that is natural, yet might be nice to add an assert or something to guarantee no-one will copy this code to something else that uses projective transforms. Or, maybe having "affine" as part of the function name? > + > +if (tx < tx1) > +tx1 = tx; > +if (ty < ty1) > +ty1 = ty; > +if (tx > tx2) > +tx2 = tx; > +if (ty > ty2) > +ty2 = ty; > +} > + > +transformed->x1 = tx1; > +transformed->y1 = ty1; > +t