Re: [Pixman] [PATCH] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Lennart Sorensen
On Tue, Apr 14, 2015 at 07:20:50PM +0100, Ben Avison wrote:
> On Tue, 14 Apr 2015 11:28:51 +0100, Pekka Paalanen  
> wrote:
> >you asked about reading the cache sizes; I have no idea about that.
> >
> >>+#define PAGE_SIZE (4 * 1024)
> >
> >Hm, sysconf(3) has PAGESIZE query, but I don't know if it is
> >appropriate here. I also tend to forget that Windows might be relevant
> >for Pixman.
> 
> I see the front page of http://www.pixman.org/ says "many platforms,
> including Linux, BSD Derivatives, MacOS X, and Windows" so I expect it
> would be a requirement, yes.

How about using GetSystemInfo on windows and sysconf on the other
platforms?

-- 
Len Sorensen
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Ben Avison

On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen  wrote:

So the matrices are indexed as matrix[row][column] in Pixman?


That's my understanding, yes.


A short'ish comment somewhere to say why you are doing this offsetting
would be nice, and that the offsetting is the reason to allocate a
margin.


Since you've been reworking the patches anyway, do you have enough
information to add the comments yourself or do you want me to do them?


One simple way round this is to apply a 0.5 pixel translation in the
transformation matrix, so that the pattern becomes:

d[0] =1.00 * s[0]
d[1] =0.50 * s[0] + 0.50 * s[1]
d[2] =  1.00 * s[1]
d[3] =  0.50 * s[1] + 0.50 * s[2]

but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't
see why the fudge factor is needed at all, at least not in the affine
case; it's described as being to compensate for rounding errors, but
there is no rounding involved in fixed-point affine transforms.


Maybe the rounding refers to the pixman_fixed_to_int() calls? I could
imagine it is to guarantee that an xmin=0.5 gets really rounded to 0,
and xmax=0.5 gets rounded to 1.


There's a slightly better worked example in the later patch I was
thinking of:

http://lists.freedesktop.org/archives/pixman/2014-September/003380.html

As I say, we're getting a bit ahead of ourselves here, as there were
about 30 patches between the ones we're reworking at the moment and that
one. Søren did give some comments on it at the end of his review here:

http://lists.freedesktop.org/archives/pixman/2014-October/003457.html

which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond
the bitmap bounds if a fast path happened to over-read pixels and then
multiply them by a 0 filter coefficient. I think we can probably cater
for that better by checking whether a bitmap starts or ends at a page
boundary, and whether we're reading the first and last pixels of the
image if it does.


To be honest, both cases you describe sound strange to me. Surely if I
use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect
d[0] = 0.5 * s[0] + 0.5 * s[1]
when assuming the box filter (if I got my terminology right)...


You're forgetting it's pixel-centres that are fed through the transform.
To be honest, I think it's already quite a headache for an application to
work out how to set up the transform in order to hit a "cover" scaled
fast path. Assume the reasonable case that you want to plot the whole of
an image of size x,y at a size m,n. You need to set the diagonals of the
transform to

floor(pixman_fixed_1*(x-1)/(m-1))
floor(pixman_fixed_1*(y-1)/(n-1))
1

and then solve for

/ 0.5 \   / 0.5 \
T . | 0.5 | = | 0.5 |
\ 1   /   \ 1   /

to find the translation coefficients that will ensure the top-left source
pixel aligns exactly to the top-left destination pixel.

To then require that the caller also knows that you need to nudge things
along by 8*pixman_fixed_e as well feels like it's requiring too much
knowledge of Pixman's internals to me. I didn't really like putting it in
affine-bench to begin with for this reason, but it doesn't work without
it. So I made do with removing it as quickly as I could - the original
intention was for it to be in the same patch series, but obviously the
patch series has grown too large and unwieldy for that the be the case
any more!

When I've seen timing breakdowns of which Pixman operations are being hit
by applications, I've seen far more scaled plots with repeat type PAD
than I'd really expect. My suspicions are that the bulk of these are down
to how difficult it is for applications to set up the transform for
maximum efficiency.


I think my hope was that since we're processing images of size 1920 *
1080 pixels, you wouldn't have any particular pixels persisting in the
cache from the previous iteration (unless perhaps if you're testing a
high-factor enlargement on a platform that has caches that don't
allocate on write).


Right, so is there any need to flush_cache() at all?


Possibly not. I was taking my cue from lowlevel-blt-bench again, where it
flushes the cache between each type of test. And looking at the
implementation of flush_cache() again now, I note that due to the action
of allocate-on-write and physically mapped caches, it won't completely
flush most caches, only one page-size worth! Perhaps just delete it then.


>> +bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 
500, &n, &t1, pixman_image_composite_wrapper);
>> +bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, n, 
UINT32_MAX, NULL, &t2, pixman_image_composite_empty);

[...]

I think that part I understood. My comment was about having a {} block
without any flow control statements for it. You are just using it to
scope some variables, which suggests that the block should actually be
a separate function.


Oh right. Well, I think it was m

Re: [Pixman] [PATCH] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Ben Avison

On Tue, 14 Apr 2015 11:28:51 +0100, Pekka Paalanen  wrote:

you asked about reading the cache sizes; I have no idea about that.


+#define PAGE_SIZE (4 * 1024)


Hm, sysconf(3) has PAGESIZE query, but I don't know if it is
appropriate here. I also tend to forget that Windows might be relevant
for Pixman.


I see the front page of http://www.pixman.org/ says "many platforms,
including Linux, BSD Derivatives, MacOS X, and Windows" so I expect it
would be a requirement, yes.

[various places]

Empty line.


Do you just want to add those yourself?

[main]

Hmm. Should I maybe add another, or modify, function to print the list
of recognized pixel formats and operators into utils.c?


Like check-formats.c does, you mean? You could do, but perhaps the usage
might be starting to get rather unwieldy. I usually have a pretty good
idea of what I want to benchmark, but might benefit from a reminder of
the order the parameters need to go in. Perhaps only print the operators
and formats if someone uses an illegal name for one?


Any reason we shouldn't use the same test pattern here as
lowlevel-blt-bench is parsing? Just for the op/src/mask/dst.


Since it exists now, you might as well. I just didn't want to end up with
another enormous table to maintain like in lowlevel-blt-bench, nor did I
want to rule out being able to test things like reversing red/blue order
as part of the operation.


+if ((PIXMAN_FORMAT_R(mask_format) || PIXMAN_FORMAT_G(mask_format) || 
PIXMAN_FORMAT_B(mask_format)))
+pixman_image_set_component_alpha (mask_image, 1);


Ah, but you use a different way to set CA. Any reason to differ from
lowlevel-blt-bench?


Not really. It's just that component alpha operations only make sense
with a mask pixel format that includes RGB components, and it's a waste
of memory and cache space for unified alpha operations to use a mask
pixel format unless it only contains alpha - so I've long felt it's a bit
redundant to have to specify the component/unified flag in addition.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/7] lowlevel-blt-bench test pattern parser

2015-04-14 Thread Ben Avison

On Tue, 14 Apr 2015 09:36:38 +0100,
Pekka Paalanen  wrote:


On Mon, 13 Apr 2015 18:42:45 +0100
"Ben Avison"  wrote:


On Mon, 13 Apr 2015 12:31:35 +0100,
Pekka Paalanen  wrote:
> None of the existing tests has a solid mask for CA tests. I'm not
> sure how much it makes sense.

When you're in component alpha mode, mathematically the source and mask
images are interchangeable. So you could implement over__n__ca
by calling over_n___ca, and we already have a number of fast
paths for operations of the latter form. (I'm not aware that Pixman is
- yet - aware that it can make such substitutions, but I'm sure it
could be added.)


Hmm, yes, interesting, at least in the cases where the operator indeed
allows it... I don't know the operators by heart so I don't know if
there are any that wouldn't work like that. An idea to remember.


Having thought about it some more, and checked the source code (it's
clearer in pixman-combine-float.c because the maths isn't obfuscated by
the need to work in fixed-point) I'm going to retract the claim. Or at
least restrict its applicability. (This demonstrates the danger of there
not being anyone around to contradict me!)

The problem is that although alpha is considered independently for each
colour component of the mask - so you just multiply the source red by the
mask red and so on, I'd neglected the alpha component in the source
image. Yes, because Pixman premultiplies the alpha into the RGB
components, it doesn't directly affect the RGB output of the combined
source+mask image, but there's a secondary output, a per-channel alpha,
which affects the Porter-Duff operation. For example, with an OVER
operation, the inverse of the per-channel alpha is multiplied into the
existing value from the destination buffer before you add-saturate the
combined source+mask into it.

My statement about being able to exchange the source and mask images is
true, but only if the alpha component is 1 in both images. For example,
over_0565_n__ca or over_x888_n__ca where (in both cases)
n.a == 1 would both be suitable candidates for exchanging the source and
mask.

For the avoidance of doubt, the Porter-Duff and PDF combiners are all
just binary operators, defined in terms of one source and one destination
image. Whenever Pixman talks about unified or component alpha operations,
it's just concerned with how the source and mask images are combined into
the effective source image for the fundamental combiner. It just happens
to be more efficient in most cases to do both the source/mask and source/
destination calculations at the same time, rather than writing out the
intermediate image into a buffer somewhere.


So, can I take it that you gave your Reviewed-by for the whole series?


Yes.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Pekka Paalanen
On Wed, 08 Apr 2015 02:27:48 +0100
"Ben Avison"  wrote:

> On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen  
> wrote:

> > If there is no transform, why not return the original extents?
> > These have been reduced by a half unit in all four directions.
> 
> It's more obviously relevant in the bilinear scaling case, but a pixel's
> colour is considered to apply to the midpoint of each pixel. Thus an
> image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ...
> (N-1.5), (N-0.5). Therefore, when you're working out which source pixels
> are required to generate a range of output pixels, it's 0.5 and (N-0.5)
> that you need to feed through any transformation matrix. If the no
> transform case were special-cased, you'd then need to special-case it
> again in the calling function because the 0.5 offset applies (albeit at
> different scaling factors) to both source and destination images, so the
> caller will be adding the 0.5 pixel border back on again.

Ok, so the extents going in to compute_transformed_extents() have a
different meaning than the ones coming out. The ones going in define
the edges of the image, while the ones coming out are reduced to the
corner pixel centers.

Which means that it is incorrect to, say, compute extents for two
sequential transforms one by one but you must combine the transforms
first and then call compute_transformed_extents() just once.

I was just surprised by this convention.

> In affine-bench, I calculate image sizes based on bilinear scaling for
> simplicity, since a source or mask image big enough to satisfy COVER_CLIP
> for bilinear scaling will also do so for nearest scaling.
> 
> [compute_transformed_extents]
> >> +tx1 = ty1 = INT64_MAX;
> >> +tx2 = ty2 = INT64_MIN;
> >> +
> >> +for (i = 0; i < 4; ++i)
> >> +{
> >> +pixman_fixed_48_16_t tx, ty;
> >> +pixman_vector_t v;
> >> +
> >> +v.vector[0] = (i & 0x01)? x1 : x2;
> >> +v.vector[1] = (i & 0x02)? y1 : y2;
> >> +v.vector[2] = pixman_fixed_1;
> >> +
> >> +if (!pixman_transform_point (transform, &v))
> >> +return FALSE;
> >> +
> >> +tx = (pixman_fixed_48_16_t)v.vector[0];
> >> +ty = (pixman_fixed_48_16_t)v.vector[1];
> >
> > This works only for affine. This is called affine-bench, so that is
> > natural, yet might be nice to add an assert or something to guarantee
> > no-one will copy this code to something else that uses projective
> > transforms.
> 
> I might be missing something, but it looks to me like
> pixman_transform_point (through its subroutine
> pixman_transform_point_31_16) does handle projective transforms, and what
> this code segment is doing is iterating over the four corners of the
> destination rectangle and finding the maximum and minimum source x/y by
> considering the transformed position of each corner independently. Surely
> the whole set of destination pixels, once passed through even a
> projective matrix, must still lie within this bounding box?

Ahh, another case where I had wrong assumptions.
pixman_transform_point() indeed does guarantee, that v.vector[2] is
1.0. I'm used to assume it's not guaranteed because the guarantee
leads to a division by zero for points at infinity. But Pixman cannot
deal with points at infinity anyway, so it sort of makes sense in
relation to pixel images.

> [flush_cache]
> >> +volatile const char *x = clean_space;
> [...]
> >> +(void) *x;
> >
> > Are you sure this actually does the read? That the compiler is not
> > allowed to discard the read? That's something I'm never sure of.
> 
> That's the point of the volatile qualifier - the compiler isn't allowed
> to throw away accesses to an object which is declared volatile, it has to
> perform exactly as many accesses as you ask for and in the same order.
>  From the C standard:
> 
> "A volatile declaration may be used to describe an object
> corresponding to a memory-mapped input/output port or an object
> accessed by an asynchronously interrupting function. Actions on
> objects so declared shall not be 'optimized out' by an
> implementation or reordered except as permitted by the rules for
> evaluating expressions."
> 
> What you do need to be careful of these days is that this says nothing
> about the number of or ordering of accesses as viewed by any part of the
> system beyond the L1 cache, including any other cores - that's where
> memory barriers come in. But that's not relevant for the purposes of
> cleaning the cache.

Very good.

> >> +x += 32;
> >
> > Why 32? Is this CACHELINE_LENGTH from lowlevel-blt-bench.c?
> 
> Yes, that's the origin. You might argue for reading the cacheline length
> at runtime, as with cache and page size, but it's less critical here. If
> your cacheline length was actually 64 bytes, then reading every 32nd byte
> would still pull in every cacheline. Reading every single byte would have
> the same effect but be a bit of a waste of time; t

Re: [Pixman] [PATCH] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Pekka Paalanen
On Wed,  8 Apr 2015 14:20:09 +0100
Ben Avison  wrote:

> ---
>  .gitignore|1 +
>  test/Makefile.sources |1 +
>  test/affine-bench.c   |  394 
> +
>  3 files changed, 396 insertions(+), 0 deletions(-)
>  create mode 100644 test/affine-bench.c
> 
> diff --git a/.gitignore b/.gitignore
> index 0f11496..7da6b6f 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -46,6 +46,7 @@ demos/tri-test
>  pixman/pixman-srgb.c
>  pixman/pixman-version.h
>  test/a1-trap-test
> +test/affine-bench
>  test/affine-test
>  test/alpha-loop
>  test/alphamap
> diff --git a/test/Makefile.sources b/test/Makefile.sources
> index c20c34b..8b0e855 100644
> --- a/test/Makefile.sources
> +++ b/test/Makefile.sources
> @@ -37,6 +37,7 @@ OTHERPROGRAMS = \
>   radial-perf-test\
>  check-formats   \
>   scaling-bench   \
> + affine-bench\
>   $(NULL)
>  
>  # Utility functions
> diff --git a/test/affine-bench.c b/test/affine-bench.c
> new file mode 100644
> index 000..720d066
> --- /dev/null
> +++ b/test/affine-bench.c
> @@ -0,0 +1,394 @@
> +/*
> + * Copyright © 2014 RISC OS Open Ltd
...
> + * Author:  Ben Avison (bavi...@riscosopen.org)
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "utils.h"
> +
> +#ifdef HAVE_GETTIMEOFDAY
> +#include 
> +#else
> +#include 
> +#endif
> +
> +#define WIDTH  1920
> +#define HEIGHT 1080
> +#define MAX_L2CACHE_SIZE (8 * 1024 * 1024) /* How much data to read to flush 
> all cached data to RAM */

Hi,

you asked about reading the cache sizes; I have no idea about that.

> +#define PAGE_SIZE (4 * 1024)

Hm, sysconf(3) has PAGESIZE query, but I don't know if it is
appropriate here. I also tend to forget that Windows might be relevant
for Pixman.

I think this #define is just fine for now. It clearly documents what
the number is.

> +
> +typedef struct box_48_16 box_48_16_t;

...

> +static void
> +create_image (uint32_t   width,
> +  uint32_t   height,
> +  pixman_format_code_t   format,
> +  pixman_filter_tfilter,
> +  const pixman_transform_t  *t,
> +  uint32_t **bits,
> +  pixman_image_t   **image)
> +{
> +uint32_t stride = (width * PIXMAN_FORMAT_BPP(format) + 31) / 32 * 4;

Space before opening paren.

> +
> +*bits = aligned_malloc (PAGE_SIZE, stride * height);
> +memset (*bits, 0xCC, stride * height);
> +*image = pixman_image_create_bits (format, width, height, *bits, stride);
> +pixman_image_set_repeat (*image, PIXMAN_REPEAT_NORMAL);
> +pixman_image_set_filter (*image, filter, NULL, 0);
> +}
> +
> +/* This needs to match the shortest cacheline length we expect to encounter 
> */
> +#define CACHE_CLEAN_INCREMENT 32
> +
> +static void
> +flush_cache (void)
> +{
> +static const char clean_space[MAX_L2CACHE_SIZE];
> +volatile const char *x = clean_space;
> +const char *clean_end = clean_space + sizeof clean_space;

Empty line.

> +while (x < clean_end)
> +{
> +(void) *x;
> +x += CACHE_CLEAN_INCREMENT;
> +}
> +}
> +
> +/* Obtain current time in microseconds modulo 2^32 */
> +uint32_t
> +gettimei (void)
> +{
> +#ifdef HAVE_GETTIMEOFDAY
> +struct timeval tv;
> +
> +gettimeofday (&tv, NULL);
> +return tv.tv_sec * 100 + tv.tv_usec;
> +#else
> +return (uint64_t) clock () * 100 / CLOCKS_PER_SEC;
> +#endif
> +}

...

> +int
> +parse_fixed_argument (char *arg, pixman_fixed_t *value)
> +{
> +char *tailptr;

Empty line.

> +*value = pixman_double_to_fixed (strtod (arg, &tailptr));

Empty line.

> +return *tailptr == '\0';
> +}
> +
> +int
> +parse_arguments (int   argc,
> + char *argv[],
> + pixman_transform_t   *t,
> + pixman_op_t  *op,
> + pixman_format_code_t *src_format,
> + pixman_format_code_t *mask_format,
> + pixman_format_code_t *dest_format)
> +{
> +if (!parse_fixed_argument (*argv, &t->matrix[0][0]))
> +return 0;
> +
> +if (*++argv == NULL)
> +return 1;
> +
> +if (!parse_fixed_argument (*argv, &t->matrix[0][1]))
> +return 0;
> +
> +if (*++argv == NULL)
> +return 1;
> +
> +if (!parse_fixed_argument (*argv, &t->matrix[1][0]))
> +return 0;
> +
> +if (*++argv == NULL)
> +return 1;
> +
> +if (!parse_fixed_argument (*argv, &t->matrix[1][1]))
> +return 0;
> +
> +if (*++argv == NULL)
> +return 1;
> +
> +*op = operator_from_string (*argv);
> +if (*op == PIXMAN_OP_NONE)
> +return 0;
> +
> +if (*++argv == NULL)
> +return 1;
> +
> +*src_format = format_from_string (*argv);
> +if (*src_format == PIXMAN_null)
> +return 0;
> +
> +++

Re: [Pixman] [PATCH 0/7] lowlevel-blt-bench test pattern parser

2015-04-14 Thread Pekka Paalanen
On Mon, 13 Apr 2015 18:42:45 +0100
"Ben Avison"  wrote:

> On Mon, 13 Apr 2015 12:31:35 +0100, Pekka Paalanen  
> wrote:
> > Apart from restructuring, there is one significant(?) difference to
> > Ben's patches: for a solid mask, Ben used a8r8g8b8, but my code uses a8.
> 
> I had to remind myself how things hang together to check if this is
> significant or not...
> 
> When lowlevel-blt-bench tests a solid image (source or mask), it achieves
> it not by calling pixman_image_create_solid_fill() but by creating a 1x1
> pixel bitmap using pixman_image_create_bits(). However, the effect is
> effectively the same because compute_image_info() detects single-pixel
> bitmaps and records it in image->common.extended_format_code before we
> start doing fast path lookup. However, image->type remains as BITS rather
> than SOLID because the non-common parts of the pixman_image union retain
> their original meaning - for a native solid fill image, it's just the
> colour in three different formats, but for bitmap images it's a pointer
> to and sizes of the bitmap and colour LUT (if applicable), various helper
> function pointers and so on.
> 
> A typical ARM assembly fast path is written to assume that any argument
> corresponding to a solid colour is already in an  format where the
> red-blue ordering is the same as the destination image. The magic that
> ensures this is hidden in the wrapper C function generated by
> PIXMAN_ARM_BIND_FAST_PATH_N_DST and similar, where it calls
> _pixman_image_get_solid(). You'll see similar calls more explicitly in
> pixman-fast-path.c (architecture-neutral fast paths) and the equivalent
> sources files for other architectures.
> 
> There are a few circumstances where you might not want to use
> _pixman_image_get_solid()  - perhaps most obviously are the operations
> which use a solid image in pixel format a2r10g10b10 because you'd lose 2
> bits of precision per colour component. There are a few example of those
> amongst those you picked out in special_patterns[].
> 
> It's worth noting that _pixman_image_get_solid() has optimised code to
> extract the colour from image->type==BITS images if the source/mask image
> is of format a8r8g8b8, x8r8g8b8 or a8. Other formats will still work, but
> more laboriously.
> 
> Not that any of this should matter at all for the purposes of
> lowlevel-blt-bench. _pixman_image_get_solid() is only called once per
> call of pixman_image_composite(), so it is part of the overhead that
> should be being accounted for by the code wrapped in #if EXCLUDE_OVERHEAD.

Ok, so if I understood you right, this difference should be harmless,
especially with the addition mentioned below.

> > should we also add a condition, that if a test has CA flag and a solid
> > mask, then that mask needs to be a8r8g8b8?
> 
> That might be desirable, if only because lowlevel-blt-bench initialises
> all its images using memset(0xCC) so an a8 solid image would be converted
> by _pixman_image_get_solid() to 0xCC00 whereas an a8r8g8b8 would be
> 0x. When you're not in component alpha mode, only the alpha byte
> matters for the mask image, but in the case of component alpha
> operations, a fast path might decide that it can save itself a lot of
> multiplications if it spots that 3 constant mask components are already 0.

Ok, good. I think I will make that a separate follow-up patch, so I can
properly record the reasons and effects.

> > None of the existing tests has a solid mask for CA tests. I'm not sure
> > how much it makes sense.
> 
> When you're in component alpha mode, mathematically the source and mask
> images are interchangeable. So you could implement over__n__ca by
> calling over_n___ca, and we already have a number of fast paths
> for operations of the latter form. (I'm not aware that Pixman is - yet -
> aware that it can make such substitutions, but I'm sure it could be
> added.)

Hmm, yes, interesting, at least in the cases where the operator indeed
allows it... I don't know the operators by heart so I don't know if
there are any that wouldn't work like that. An idea to remember.

> > Ben, when we eventually get to the actual optimization patches, I expect
> > I will reproduce the benchmarks myself while reviewing. Is that ok?
> 
> That's definitely fine by me, thanks for offering. Fingers crossed it
> doesn't substantially change the results :-)
> 
> > What do you think of this series?
> 
> In general, it looks like an improvement to me. I don't think the test
> programs have had this much attention in a while! One minor point in
> patch 6's log message, you copied my typo:
> 
>_[_[_ca]
> 
> should be
> 
>_[_[_ca]

Ahh, yes, I'll fix that.

So, can I take it that you gave your Reviewed-by for the whole series?

For the record, if the terminology of S-o-b, R-b, Acked-by, etc. is
foreign to you or anyone, they are documented in the Linux kernel:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/Submitti