I also hit this issue in the last week or so, doing almost exactly the same thing (except mine was a sharpening filter). It also totally locked my system up so I had to give it a hard reboot.

As Troy said the loop gets unrolled anyway, plus doing 100 iterations per pixel is very likely going to be pretty slow on any kind of decent sized texture. It's really worth seeing if you can get away with a smaller number of iterations, or at least optimise it as much as possible.

In my case, I found 12 well chosen samples was good enough, and I set the 12 samplercoords manually to avoid the extra computations to calculate them (this might happen anyway as a compiler optimisation, but it won't harm). Another thing that might help a little is to move anything that's calculated for the whole image rather than per pixel out of the shader altogether. E.g. you have the line "float radSq = radius * radius;" - you could do this in a math patch so it's not evaluated per pixel (again, this might get optimised out by the compiler anyway).

Chris



On 10 Aug 2009, at 02:05, Troy Koelling wrote:

The gpu can only handle a certain amount of instructions in one go. More than that, and you'll have performance problems at best, and at worst you'll hit driver bugs.

Loops in core image are compiler candy, they are entirely unrolled before upload to the gpu.

Sent from my iPhone

On Aug 8, 2009, at 6:19 PM, James Walker <[email protected]> wrote:

I haven't been able to find much on performance or limitations of Core Image kernels beyond the statement that you can't use variable loop limits. I thought I could just use fixed limits bigger than needed, like so:


kernel vec4 blur(sampler image, float radius)
{
  vec2        basePos = destCoord();
  int        i, j;
  vec4        sum = vec4(0.0);
  float            radSq = radius * radius;
  vec2        delta;

  for (i = -10; i <= 10; ++i)
  {
      delta.x = float(i);
      for (j = -10; j <= 10; ++j)
      {
          delta.y = float(j);
          sum += ((dot(delta, delta) < radSq)?
              sample( image, basePos + delta ) :
              vec4(0));
      }
  }

  sum.rgb /= sum.a;
  sum.a = 1.0;

  return sum;
}


As is, this works, albeit more slowly than I would like. But what I don't understand is that if I change those loop limits from 10s to 20s, it doesn't just take 4 times longer, weird stuff happens. Like the whole screen flickers, and Quartz Composer hangs, and then the whole machine hangs requiring a forced restart. What's up with that?

By the way, someone will doubtless notice that I am essentially doing a disk blur here, and point out that Apple ships a disk blur filter. What I ultimately want to do is more complicated, this is just a learning exercise along the way.

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list ([email protected] )
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/tkoelling%40apple.com

This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list ([email protected] )
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/psonice%40gmail.com

This email sent to [email protected]

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to