[Gegl-developer] Fwd: [Gimp-developer] gegl-vips

2011-04-17 Thread Øyvind Kolås
-- Forwarded message --
From:  jcup...@gmail.com
Date: Sun, Apr 17, 2011 at 10:22 AM
Subject: [Gimp-developer] gegl-vips
To: gimp-developer gimp-develo...@lists.xcf.berkeley.edu


Hi all,

I've had a stab at a quick hack of gegl-0.1.6 to use libvips (another
demand-driven image processing library) as the backend for batch
processing. I think it is maybe an interesting way to look at gegl
performance, for this use case at least.

https://github.com/jcupitt/gegl-vips

This has some very strong limitations. First, it will not work
efficiently with interactive destructive operations, like paint a
line. This would need area cache invalidation in vips, which is a way
off. Secondly, I've only implemented a few operations (load / crop /
affine / unsharp / save / process), so all you can do is some very
basic batch processing. It should work for dynamic graphs (change the
parameters on a node and just downstream nodes will recalculate) but
it'd need a display node to be able to test that.

There's a README with some more detail on how it works, a test program
and some timings.

If I run the test program linked against gegl-0.1.6 on a 5,000 x 5,000
pixel RGB PNG image on my laptop (a c2d at 2.4GHz), I get 96s real,
44s user. I tried experimenting with various settings for GEGL_SWAP
and friends, but I couldn't get it to go faster than that, I probably
missed something. Perhaps gegl's disk cache plus my slow laptop
harddrive are slowing it down.

Linked against gegl-vips with the operations set to exactly match
gegl's processing, the same thing runs in 27s real, 38s user. So it
looks like some tuning of the disc cache, or maybe even turning it off
for batch processing, where you seldom need pixels more than once,
could give gegl a very useful speedup here. libvips has a threading
system which is on by default and does double-buffered write-behind,
which also help.

If you use uncompressed tiff, you can save a further 15s off the
runtime. libpng compression is slow, and even with compression off,
file write is sluggish.

The alpha channel is not needed in this case, dropping it saves about
5s real time.

babl converts to linear float and back with exp() and log(). Using
lookup tables instead saves 12s.

The gegl unsharp operator is implemented as gblur/sub/mul/add. These
are all linear operations, so you can fold the maths into a single
convolution. Redoing unsharp as a separable convolution saves 1s.

Finally, we don't really need 16-bit output here, 8 is fine. This
saves only 0.5s for tiff, but 8s for PNG.

Putting all these together, you get the same program running in 2.3s
real, 4s user. This is still using linear float light internally. If
you switch to a full 8-bit path you get 1s real, 1.5s user. I realise
gegl is committed to float, but it's interesting to put a number on
the cost.

Does this sound useful? I think it's maybe a way to weight the
benefits of the various possible optimisations. I might try running
the tests on a machine with a faster hard disk.

John
___
Gimp-developer mailing list
gimp-develo...@lists.xcf.berkeley.edu
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gimp-developer



-- 
«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


[Gegl-developer] Fwd: [Gimp-developer] gegl-vips

2011-04-17 Thread Øyvind Kolås
I didn't see that the discussion was on gimp-devel, forwarding my
reply as well, in case someone are following GEGL developer/digging in
its archives.

-- Forwarded message --
From: Øyvind Kolås pip...@gimp.org
Date: Sun, Apr 17, 2011 at 2:24 PM
Subject: Re: [Gimp-developer] gegl-vips
To: jcup...@gmail.com
Cc: gimp-developer gimp-develo...@lists.xcf.berkeley.edu


Thank you for taking a serious look at GEGL, I've trimmed away the
bits relating to the VIPS backend and rather focus on the performance
numbers you get out and will try to explain them.

On Sun, Apr 17, 2011 at 10:22 AM,  jcup...@gmail.com wrote:
 Linked against gegl-vips with the operations set to exactly match
 gegl's processing, the same thing runs in 27s real, 38s user. So it
 looks like some tuning of the disc cache, or maybe even turning it off
 for batch processing, where you seldom need pixels more than once,
 could give gegl a very useful speedup here. libvips has a threading
 system which is on by default and does double-buffered write-behind,
 which also help.

On my c2d 1.86ghz laptop I get 105s real 41s user with default settings.
Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping
of tiles makes it run in 43s real 41s user. With the default settings
GEGL will start swapping when using more than 128mb of memory for
buffers, this limit can be increased by setting for instance
GEGL_CACHE_SIZE=1024 to not start swapping until 1gb of memory is in
use. This leads to similar behavior, the tile backend of GEGL is using
reads and writes on the tiles, using mmaping instead could increase
the performance.

 If you use uncompressed tiff, you can save a further 15s off the
 runtime. libpng compression is slow, and even with compression off,
 file write is sluggish.

Loading a png into a tiled buffer as used by GeglBuffer is kind of
bound to be slow, at the moment GEGL doesnt have a native TIFF loader,
if the resources were spent on writing a proper TIFF backend to
GeglBuffer GEGL would be able to lazily swap in the image data from
TIFF files as needed.

 babl converts to linear float and back with exp() and log(). Using
 lookup tables instead saves 12s.

If the original PNG was 8bit, babl should have a valid fast path for
using lookup tables converting it to 32bit linear. For most other
conversions involved in this process babl would likely fall back to
reference conversions that go via 64bit floating point; and processes
each pixel with lots of logic perutating components etc. By
adding/fixing the fast paths in babl to match the reference conversion
a lot of the time spent converting pixels in this test should vanish.

 The gegl unsharp operator is implemented as gblur/sub/mul/add. These
 are all linear operations, so you can fold the maths into a single
 convolution. Redoing unsharp as a separable convolution saves 1s.

For smaller radiuses this is fine, for larger ones it is not, ideally
GEGL would be doing what is optimal behind the users back.

 Finally, we don't really need 16-bit output here, 8 is fine. This
 saves only 0.5s for tiff, but 8s for PNG.

Making the test case you used save to 8bit PNG instead gives me 34s
real and 33s user. I am not entirely sure if babl has a 32bit float -
8bit nonlinear RGBA conversion, it might just be libpngs data
throughput that makes this difference.

 save = gegl_node_new_child (gegl,
                             operation, gegl:png-save,
                             bitdepth, 8,
                             path, argv[2],
                             NULL);

 Putting all these together, you get the same program running in 2.3s
 real, 4s user. This is still using linear float light internally. If
 you switch to a full 8-bit path you get 1s real, 1.5s user. I realise
 gegl is committed to float, but it's interesting to put a number on
 the cost.

This type of benchmark really stress tests the file loading/saving
parts of code where I am fully aware that GEGL is far from optimal,
but it is also something that doesn't in any way reflect GIMPs
_current_ use of GEGL which involves converting 8bit data to and from
float with some very specific formats and then only doing raw
processing. This will of course change in the future.

 Does this sound useful? I think it's maybe a way to weight the
 benefits of the various possible optimisations. I might try running
 the tests on a machine with a faster hard disk.

It is useful, but it would perhaps be even more useful to see similar
results for a test where the loading/saving is taken out of the
benchmark
and measure raw image data crunching.

Setting GEGL_SWAP=RAM, BABL_TOLERANCE=0.02 in the environment to make
babl be  lenient with the error introduced by its fast paths I run the
test in, it should be possible to fix the fast paths in babl to be
correct enough to pass the current stricter criteria for use; and thus
get these results without lowering standards. Even adding slightly
faster but guaranteed to be correct