Anybody else want to comment or report timings? Are there any objections to my committing this to the master (i.e. improve the non-TBB case, but not yet change the default for USE_TBB)?
I'm also still open to changing USE_TBB to default to 0, or even removing TBB entirely, if everybody's timings show that non-TBB is not significantly worse than TBB across a wide variety of hardware and OS configurations. On Jun 12, 2012, at 12:23 AM, Larry Gritz wrote: > https://github.com/OpenImageIO/oiio/pull/273 > > This was from a few months back, I didn't have enough success to push for a > commit, but now I have a major update: > > After some weekend reading, I had some ideas for how to improve the spin > locks and just pushed an update in which, on my two machines (a Mac OS X > laptop, and a 12-core Linux box), my non-TBB spinlock now outperforms the TBB > spinlocks (by a significant margin on Linux!). > > Could those of you on this thread please try this out? Check out the branch > for this pull request, do "make nuke ; make USE_TBB=1 ; make test > TEST=spinlock" and then the same thing again with USE_TBB=0 (be sure to do a > full make nuke first). > > If the rest of you find the same thing -- that the USE_TBB=0 benchmark is no > slower than the USE_TBB=1 -- then perhaps we can finally retire TBB. > > For the curious, the three improvements were: > > (a) exponential backoff for spin lock contention in the same manner that TBB > was doing; > > (b) try_lock can spin more efficiency with read-only (non-bus-locking) check > until there's a good chance that the lock was released. In particular, you > should not spin like this: > > while (! try_lock(&thelock)) ; > > but instead > > while (! try_lock(&thelock)) > while (thelock) ; > > This is because the try_lock does a compare-and-swap, which is a writing > operation, which will lock the bus. In the latter version, if the initial > (bus-locking) try-lock fails, it will do a read-only (non-bus-locking!) check > until the lock appears to be free, then do the safe/locking one again. > > (c) on x86_64, it's safe for spin_lock::unlock() to do a normal unlocked > write. The memory ordering of these chips is such that it doesn't need the > memory fences of a full atomic write. I found numerous sources for this on > the net, and it is a big win. Apparently, it also works on some 32 bit > x86's, most of the recent chips, but I am not very interested in sorting out > which x86 chips it's safe to do it on, especially since anybody heavily > threaded enough to be concerned with this optimization is on a 64 bit system. > > OK, so let me know! If it works, it's a performance improvement as well as a > way for us to shed the pesky TBB dependency. Win, win. > > > > > On Apr 3, 2012, at 10:34 AM, Larry Gritz wrote: > >> I would very much appreciate if people could grab the code from this pull >> request, and run: >> >> build/ARCH/libOpenImageIO/atomic_test >> build/ARCH/libOpenImageIO/spinlock_test >> >> and report the timings after a fresh compile with USE_TBB ON, and again with >> USE_TBB OFF. (On Linux and OSX, if you prefer running from 'make', then >> just run 'make nuke ; make USE_TBB=1' or 'make nuke ; make USE_TBB=0'. On >> Windows, or if you prefer otherwise, just set the CMake variable USE_TBB to >> ON or OFF, respectively.) >> >> I'm especially keen to hear the results from people on Windows, as that is >> the major platform that I have no way to test myself. >> >> If this makes it appear that there is no speed penalty from falling back on >> the gcc & win32 intrinsics, compared to TBB, then I think we should simplify >> our lives by removing TBB use entirely from OIIO. >> >> >> >> On Apr 3, 2012, at 10:25 AM, Larry Gritz wrote: >> >>> Make atomic_test and spinlock_test run for many more iterations, and time >>> their results. >>> >>> This allows us to rigorously compare the speed of our atomics and spin >>> locks. >>> Also make their output a little neater by locking around the status >>> printouts, and >>> eliminate the #if's that provide safety for Boost < 1.35, which is no >>> longer supported. >>> >>> You can merge this Pull Request by running: >>> >>> git pull https://github.com/lgritz/oiio lg-atomic >>> >>> Or you can view, comment on it, or merge it online at: >>> >>> https://github.com/OpenImageIO/oiio/pull/273 >>> >>> -- Commit Summary -- >>> >>> * Make atomic_test and spinlock_test run for many more iterations, and time >>> their results. >>> >>> -- File Changes -- >>> >>> M src/libOpenImageIO/atomic_test.cpp (38) >>> M src/libOpenImageIO/spinlock_test.cpp (29) >>> >>> -- Patch Links -- >>> >>> https://github.com/OpenImageIO/oiio/pull/273.patch >>> https://github.com/OpenImageIO/oiio/pull/273.diff >>> >>> --- >>> Reply to this email directly or view it on GitHub: >>> https://github.com/OpenImageIO/oiio/pull/273 >>> _______________________________________________ >>> Oiio-dev mailing list >>> [email protected] >>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >> >> -- >> Larry Gritz >> [email protected] >> >> >> _______________________________________________ >> Oiio-dev mailing list >> [email protected] >> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org > > -- > Larry Gritz > [email protected] > > > _______________________________________________ > Oiio-dev mailing list > [email protected] > http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org -- Larry Gritz [email protected] _______________________________________________ Oiio-dev mailing list [email protected] http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
