On Thu, Oct 15, 2009 at 12:50:08AM +0200, Andreas Sandberg wrote: > I guess one possibility would be to find the areas prior to starting the > parallel region.
That's possible, but tricky. > Don't know if there is any noticeable improvement in > the user experience from this, in fact I hardly notice any difference > between running with 1 thread compared to 4 threads (I'm running on a > Core2 Quad Q9450). I did measure it and found a small improvement in responsiveness, but I don't recall how much. It would likely show up most at 100% zoom (which only existed in the devel version when I started writing this reply). > Doesn't the 'omp parallel for' in develop() create additional threads > when called from render_preview_image()? It shouldn't. The tutorial I read indicated that nested directives are handled intelligently -- if there are already the maximum number of threads operating, no more are started. However, some of this may be implementation defined. > It seems like ufraw_write_image_data parallelizes over the rows of the > image. If both ufraw_write_image_data() and render_preview_image() are > both parallelized what is the reason for having a 'parallel for' in the > develop() procedure? It isn't needed any more, unless you are running on a massively multi-core system (in which case the memory transfer overhead probably swamps the speedups). I think it was initially put there before the parallelism in ufraw_write_image_data and render_preview_image was complete. Given the concerns about nested OpenMP directives, you could take them out and test if there is any improvement. > I did some measurements on a batch conversion of 19 raw images (6 MP). > And got the following data (best time from two runs, CVS HEAD): > # Threads, Run time (s), Speedup > 1,46.66,1 > 2,33.53,1.39 > 3,26.14,1.79 > 4,26.86,1.74 > > I would expect the speedup to decrease further when going beyond 4 > threads, but I haven't got any easy access to systems with more cores at > the moment. Actually, due to scheduler quirks, you can still get speedups when going past one thread per core. > Haven't really had time to analyze where the bottlenecks > are, but I guess that makes for a nice project a rainy day... :) There are some parts of the file writing that simply must happen in sequence (for example, writing out the JPEG file), so it will never scale perfectly. -- Bruce Guenter <[email protected]> http://untroubled.org/
pgprlz0QH1iyn.pgp
Description: PGP signature
------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference
_______________________________________________ ufraw-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ufraw-devel
