Hi Larry, Would this also affect OSL? Or would TBB still be needed there?
Best, Erich On Jun 14, 2012, at 5:13 PM, Larry Gritz wrote: > Anybody else want to comment or report timings? > > Are there any objections to my committing this to the master (i.e. improve > the non-TBB case, but not yet change the default for USE_TBB)? > > I'm also still open to changing USE_TBB to default to 0, or even removing TBB > entirely, if everybody's timings show that non-TBB is not significantly worse > than TBB across a wide variety of hardware and OS configurations. > > > > > On Jun 12, 2012, at 12:23 AM, Larry Gritz wrote: > >> https://github.com/OpenImageIO/oiio/pull/273 >> >> This was from a few months back, I didn't have enough success to push for a >> commit, but now I have a major update: >> >> After some weekend reading, I had some ideas for how to improve the spin >> locks and just pushed an update in which, on my two machines (a Mac OS X >> laptop, and a 12-core Linux box), my non-TBB spinlock now outperforms the >> TBB spinlocks (by a significant margin on Linux!). >> >> Could those of you on this thread please try this out? Check out the branch >> for this pull request, do "make nuke ; make USE_TBB=1 ; make test >> TEST=spinlock" and then the same thing again with USE_TBB=0 (be sure to do a >> full make nuke first). >> >> If the rest of you find the same thing -- that the USE_TBB=0 benchmark is no >> slower than the USE_TBB=1 -- then perhaps we can finally retire TBB. >> >> For the curious, the three improvements were: >> >> (a) exponential backoff for spin lock contention in the same manner that TBB >> was doing; >> >> (b) try_lock can spin more efficiency with read-only (non-bus-locking) check >> until there's a good chance that the lock was released. In particular, you >> should not spin like this: >> >> while (! try_lock(&thelock)) ; >> >> but instead >> >> while (! try_lock(&thelock)) >> while (thelock) ; >> >> This is because the try_lock does a compare-and-swap, which is a writing >> operation, which will lock the bus. In the latter version, if the initial >> (bus-locking) try-lock fails, it will do a read-only (non-bus-locking!) >> check until the lock appears to be free, then do the safe/locking one again. >> >> (c) on x86_64, it's safe for spin_lock::unlock() to do a normal unlocked >> write. The memory ordering of these chips is such that it doesn't need the >> memory fences of a full atomic write. I found numerous sources for this on >> the net, and it is a big win. Apparently, it also works on some 32 bit >> x86's, most of the recent chips, but I am not very interested in sorting out >> which x86 chips it's safe to do it on, especially since anybody heavily >> threaded enough to be concerned with this optimization is on a 64 bit system. >> >> OK, so let me know! If it works, it's a performance improvement as well as >> a way for us to shed the pesky TBB dependency. Win, win. >> >> >> >> >> On Apr 3, 2012, at 10:34 AM, Larry Gritz wrote: >> >>> I would very much appreciate if people could grab the code from this pull >>> request, and run: >>> >>> build/ARCH/libOpenImageIO/atomic_test >>> build/ARCH/libOpenImageIO/spinlock_test >>> >>> and report the timings after a fresh compile with USE_TBB ON, and again >>> with USE_TBB OFF. (On Linux and OSX, if you prefer running from 'make', >>> then just run 'make nuke ; make USE_TBB=1' or 'make nuke ; make USE_TBB=0'. >>> On Windows, or if you prefer otherwise, just set the CMake variable >>> USE_TBB to ON or OFF, respectively.) >>> >>> I'm especially keen to hear the results from people on Windows, as that is >>> the major platform that I have no way to test myself. >>> >>> If this makes it appear that there is no speed penalty from falling back on >>> the gcc & win32 intrinsics, compared to TBB, then I think we should >>> simplify our lives by removing TBB use entirely from OIIO. >>> >>> >>> >>> On Apr 3, 2012, at 10:25 AM, Larry Gritz wrote: >>> >>>> Make atomic_test and spinlock_test run for many more iterations, and time >>>> their results. >>>> >>>> This allows us to rigorously compare the speed of our atomics and spin >>>> locks. >>>> Also make their output a little neater by locking around the status >>>> printouts, and >>>> eliminate the #if's that provide safety for Boost < 1.35, which is no >>>> longer supported. >>>> >>>> You can merge this Pull Request by running: >>>> >>>> git pull https://github.com/lgritz/oiio lg-atomic >>>> >>>> Or you can view, comment on it, or merge it online at: >>>> >>>> https://github.com/OpenImageIO/oiio/pull/273 >>>> >>>> -- Commit Summary -- >>>> >>>> * Make atomic_test and spinlock_test run for many more iterations, and >>>> time their results. >>>> >>>> -- File Changes -- >>>> >>>> M src/libOpenImageIO/atomic_test.cpp (38) >>>> M src/libOpenImageIO/spinlock_test.cpp (29) >>>> >>>> -- Patch Links -- >>>> >>>> https://github.com/OpenImageIO/oiio/pull/273.patch >>>> https://github.com/OpenImageIO/oiio/pull/273.diff >>>> >>>> --- >>>> Reply to this email directly or view it on GitHub: >>>> https://github.com/OpenImageIO/oiio/pull/273 >>>> _______________________________________________ >>>> Oiio-dev mailing list >>>> [email protected] >>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >>> >>> -- >>> Larry Gritz >>> [email protected] >>> >>> >>> _______________________________________________ >>> Oiio-dev mailing list >>> [email protected] >>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >> >> -- >> Larry Gritz >> [email protected] >> >> >> _______________________________________________ >> Oiio-dev mailing list >> [email protected] >> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org > > -- > Larry Gritz > [email protected] > > > _______________________________________________ > Oiio-dev mailing list > [email protected] > http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org _______________________________________________ Oiio-dev mailing list [email protected] http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
