I also hit this issue in the last week or so, doing almost exactly the
same thing (except mine was a sharpening filter). It also totally
locked my system up so I had to give it a hard reboot.
As Troy said the loop gets unrolled anyway, plus doing 100 iterations
per pixel is very likely going to be pretty slow on any kind of decent
sized texture. It's really worth seeing if you can get away with a
smaller number of iterations, or at least optimise it as much as
possible.
In my case, I found 12 well chosen samples was good enough, and I set
the 12 samplercoords manually to avoid the extra computations to
calculate them (this might happen anyway as a compiler optimisation,
but it won't harm). Another thing that might help a little is to move
anything that's calculated for the whole image rather than per pixel
out of the shader altogether. E.g. you have the line "float radSq =
radius * radius;" - you could do this in a math patch so it's not
evaluated per pixel (again, this might get optimised out by the
compiler anyway).
Chris
On 10 Aug 2009, at 02:05, Troy Koelling wrote:
The gpu can only handle a certain amount of instructions in one go.
More than that, and you'll have performance problems at best, and at
worst you'll hit driver bugs.
Loops in core image are compiler candy, they are entirely unrolled
before upload to the gpu.
Sent from my iPhone
On Aug 8, 2009, at 6:19 PM, James Walker <[email protected]> wrote:
I haven't been able to find much on performance or limitations of
Core Image kernels beyond the statement that you can't use variable
loop limits. I thought I could just use fixed limits bigger than
needed, like so:
kernel vec4 blur(sampler image, float radius)
{
vec2 basePos = destCoord();
int i, j;
vec4 sum = vec4(0.0);
float radSq = radius * radius;
vec2 delta;
for (i = -10; i <= 10; ++i)
{
delta.x = float(i);
for (j = -10; j <= 10; ++j)
{
delta.y = float(j);
sum += ((dot(delta, delta) < radSq)?
sample( image, basePos + delta ) :
vec4(0));
}
}
sum.rgb /= sum.a;
sum.a = 1.0;
return sum;
}
As is, this works, albeit more slowly than I would like. But what
I don't understand is that if I change those loop limits from 10s
to 20s, it doesn't just take 4 times longer, weird stuff happens.
Like the whole screen flickers, and Quartz Composer hangs, and then
the whole machine hangs requiring a forced restart. What's up with
that?
By the way, someone will doubtless notice that I am essentially
doing a disk blur here, and point out that Apple ships a disk blur
filter. What I ultimately want to do is more complicated, this is
just a learning exercise along the way.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list ([email protected]
)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/tkoelling%40apple.com
This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list ([email protected]
)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/psonice%40gmail.com
This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/archive%40mail-archive.com
This email sent to [email protected]