Re: [UFRaw-Devel] Severe performance issues when rendering previews in UFRaw 0.15

Andreas Sandberg Wed, 14 Oct 2009 15:50:28 -0700

On Tue, 2009-10-13 at 22:22 -0600, Bruce Guenter wrote:
> On Tue, Oct 13, 2009 at 09:51:25AM +0200, Andreas Sandberg wrote:
> > From just checking the relevant pieces of the source code, it seems like
> > the changes to the OpenMP code in render_preview_image are fairly
> > similar to the changes I made to my local copy. The code in the CVS is a
> > quite a bit less hackish though. :) I noticed the critical section
> > around choose_subarea, I solved that issue the same way in my version,
> > but performance would probably benefit if we could redesign the code so
> > that the critical section can be avoided.
> 
> The critical section is pretty much unavoidable as far as I could tell
> when I made the OpenMP changes.  The only way I can think of to work
> around it is for each thread to effectively calculate what every thread
> would process, and then pick the Nth tile that would be assigned to it.
> That critical section is also called a maximum of 32+N times per redraw,
> where N is the number of threads.  Considering how much else is
> calculated per tile, the locking should not be a significant overhead,
> and the complexity of such an approach looks to be much greater (read:
> more bugs :/ ).


I guess one possibility would be to find the areas prior to starting the
parallel region. Don't know if there is any noticeable improvement in
the user experience from this, in fact I hardly notice any difference
between running with 1 thread compared to 4 threads (I'm running on a
Core2 Quad Q9450).

> > I noticed that the parallelization of develop() is still pretty much the
> > same, I still think that the it is too fine-grained. It's probably a
> > better idea to parallelize at a higher level over larger blocks of the
> > image. However, I might have misunderstood how develop() is used, my
> > impression is that it is called for a fairly small region of the image
> > at a time.
> 
> develop() is called from 2 places (ignoring the calls that use it for a
> single pixel): to develop the preview and to write the final image.
> 
> The call to develop() during preview comes through
> ufraw_convert_image_area which does no batching.  However, it itself is
> called from within render_preview_image, which does the tiling in
> parallel, so the batching is already handled.  For a 10MP camera, those
> tiles will be ~300,000 pixels per call.

Doesn't the 'omp parallel for' in develop() create additional threads
when called from render_preview_image()? I know that OpenMP is supposed
to support nested parallelism, but I don't know if it is enabled by
default.

> The writers that call develop() do it in batches of 64 rows at a time
> (see the use of DEVELOP_BATCH in ufraw_write_image_data)..  For the 10MP
> camera referenced above, that means roughly 200,000 pixels per call.  I
> did some benchmarking on my system (Core2Duo) before I settled on 64.
> Less than 32 was definitely slower, but nothing was gained going over
> 64.

It seems like ufraw_write_image_data parallelizes over the rows of the
image. If both ufraw_write_image_data() and render_preview_image() are
both parallelized what is the reason for having a 'parallel for' in the
develop() procedure?

I did some measurements on a batch conversion of 19 raw images (6 MP).
And got the following data (best time from two runs, CVS HEAD):
# Threads, Run time (s), Speedup
1,46.66,1
2,33.53,1.39
3,26.14,1.79
4,26.86,1.74

I would expect the speedup to decrease further when going beyond 4
threads, but I haven't got any easy access to systems with more cores at
the moment. Haven't really had time to analyze where the bottlenecks
are, but I guess that makes for a nice project a rainy day... :)

Another thing I noticed, ufraw_shave_hotpixels uses 'omp atomic' to
update the hot pixel count. Atomic instructions are generally expensive.
As far as I can tell omp reduction construction would work fine instead.

//Andreas



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
ufraw-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ufraw-devel

Re: [UFRaw-Devel] Severe performance issues when rendering previews in UFRaw 0.15

Reply via email to