On Tuesday 14 September 2010 08:53:37 Soeren Sandmann wrote: > Siarhei Siamashka <[email protected]> writes: > > +/* A variant of 'over', which works faster for non-additive blending on > > the + * platforms which do not have special instructions for saturated > > addition + */ > > +static force_inline uint32_t > > +over_a (uint32_t src, uint32_t dest, pixman_bool_t additive_blending) > > +{ > > + uint32_t a = ~src >> 24; > > + if (additive_blending) > > + { > > + UN8x4_MUL_UN8_ADD_UN8x4 (dest, a, src); > > + return dest; > > + } > > + else > > + { > > + UN8x4_MUL_UN8 (dest, a); > > + return dest + src; > > + } > > +} > > Is there any reason to not just add a boolean "additive_blending" to > the existing force_inline over() function?
No particular reason. The patch was just self contained and a bit less intrusive this way (in the sense of having less places in the code changed). It was mostly intended as a preview for the idea. The final implementation can indeed be done so that it better blends with the rest of code. So does it make sense to split the patch into parts, introducing this third argument for 'over' function first? > It might also be interesting to add the check as a new > NOT_SUPER_LUMINESCENT flag and then simply require it for the source > for all the over_n_*() functions. I see many reasons *not* to add it as a new flag: 1. It takes one extra flag bit. There are already 24 bits used, with only 8 remaining. We still need some flag(s) for rotation transforms: http://lists.freedesktop.org/archives/pixman/2010-August/000420.html I expect that compacting bits later may turn out to be tricky, so it may be wise not to waste them in the first place. Extending flag bits to 64-bit variable is possible, but may reduce performance. 2. After introducing this bit, every compositing operation with a solid source will do calculation for this flag, spending some time on it. But calculation of this flag is not needed for many operators (SRC for example). Also it is only useful exclusively for C fast paths and simple SIMD-incapable processors, everyone else will just take a tiny performance hit. The 'last mile' check as implemented in my patch should be fine as far as performance is concerned. The only drawback is that the one who implements the fast path functions, will be forced to handle all possible types of input data. And not be lazy providing just NOT_SUPER_LUMINESCENT operation only, relying on pixman to fallback to someting else when needed. BTW, I like this 'super-luminescent' term :) I tried to search for the information about the case when "color components exceed alpha in premultiplied format", and it looked like many (game developers) know about this thing and its features, but seemed like nobody had a clear single-word definition for it. Searching for "super-luminescent premultiplied" gives some references, all in cairo and freedesktop.org context. Anyway, let's indeed call this thing 'super-luminescent'. I think I need to update comments in the patch and also in the commit message to use it instead of 'additive blending', which I took from: http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Premultiplied%20alpha]] > That would allow similar optimizations for the n_8_565 case and probably the > n_8888_8888_ca() case as well. Yes, and also 'over_n_8888' could make use of this optimization (if C fast path function even gets implemented for it). > The flag could be set for all the gradients and any time an image is > opaque. I'm not quite sure about how useful this flag could be for gradients (it would have to be somehow propagated to the scanline combiner function?). But another important operation is over_8888_8888. And it is hard to do anything with it because we don't known if there are any super-luminescent pixels in the source image. Maybe it can make sense reconsidering how the super-luminiscent colors are handled in general? When discussing it on IRC the other day, there were even concerns about where such pixels could possibly come from and whether they are even used in cairo in any way. But this stuff is only important for C implementation and simple processors. So for now I would just go after the simple C fast path functions like over_n_8_8888/over_n_8_0565 and maybe add the rest of ideas into TODO list. Probably some people are more motivated in improving pixman performance on simple processors? I'm adding Georgi Beloev to CC just in case because he seems to be interested in MIPS32R2. -- Best regards, Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
