On Tue, Oct 13, 2009 at 09:51:25AM +0200, Andreas Sandberg wrote: > From just checking the relevant pieces of the source code, it seems like > the changes to the OpenMP code in render_preview_image are fairly > similar to the changes I made to my local copy. The code in the CVS is a > quite a bit less hackish though. :) I noticed the critical section > around choose_subarea, I solved that issue the same way in my version, > but performance would probably benefit if we could redesign the code so > that the critical section can be avoided.
The critical section is pretty much unavoidable as far as I could tell when I made the OpenMP changes. The only way I can think of to work around it is for each thread to effectively calculate what every thread would process, and then pick the Nth tile that would be assigned to it. That critical section is also called a maximum of 32+N times per redraw, where N is the number of threads. Considering how much else is calculated per tile, the locking should not be a significant overhead, and the complexity of such an approach looks to be much greater (read: more bugs :/ ). > I noticed that the parallelization of develop() is still pretty much the > same, I still think that the it is too fine-grained. It's probably a > better idea to parallelize at a higher level over larger blocks of the > image. However, I might have misunderstood how develop() is used, my > impression is that it is called for a fairly small region of the image > at a time. develop() is called from 2 places (ignoring the calls that use it for a single pixel): to develop the preview and to write the final image. The call to develop() during preview comes through ufraw_convert_image_area which does no batching. However, it itself is called from within render_preview_image, which does the tiling in parallel, so the batching is already handled. For a 10MP camera, those tiles will be ~300,000 pixels per call. The writers that call develop() do it in batches of 64 rows at a time (see the use of DEVELOP_BATCH in ufraw_write_image_data).. For the 10MP camera referenced above, that means roughly 200,000 pixels per call. I did some benchmarking on my system (Core2Duo) before I settled on 64. Less than 32 was definitely slower, but nothing was gained going over 64. -- Bruce Guenter <[email protected]> http://untroubled.org/
pgptER7Izd6pP.pgp
Description: PGP signature
------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference
_______________________________________________ ufraw-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ufraw-devel
