Anybody else want to comment or report timings?

Are there any objections to my committing this to the master (i.e. improve the 
non-TBB case, but not yet change the default for USE_TBB)?

I'm also still open to changing USE_TBB to default to 0, or even removing TBB 
entirely, if everybody's timings show that non-TBB is not significantly worse 
than TBB across a wide variety of hardware and OS configurations.




On Jun 12, 2012, at 12:23 AM, Larry Gritz wrote:

> https://github.com/OpenImageIO/oiio/pull/273
> 
> This was from a few months back, I didn't have enough success to push for a 
> commit, but now I have a major update:
> 
> After some weekend reading, I had some ideas for how to improve the spin 
> locks and just pushed an update in which, on my two machines (a Mac OS X 
> laptop, and a 12-core Linux box), my non-TBB spinlock now outperforms the TBB 
> spinlocks (by a significant margin on Linux!).
> 
> Could those of you on this thread please try this out?  Check out the branch 
> for this pull request, do "make nuke ; make USE_TBB=1 ; make test 
> TEST=spinlock" and then the same thing again with USE_TBB=0 (be sure to do a 
> full make nuke first).
> 
> If the rest of you find the same thing -- that the USE_TBB=0 benchmark is no 
> slower than the USE_TBB=1 -- then perhaps we can finally retire TBB.
> 
> For the curious, the three improvements were: 
> 
> (a) exponential backoff for spin lock contention in the same manner that TBB 
> was doing;
> 
> (b) try_lock can spin more efficiency with read-only (non-bus-locking) check 
> until there's a good chance that the lock was released.  In particular, you 
> should not spin like this:
> 
>   while (! try_lock(&thelock)) ;
> 
> but instead
> 
>    while (! try_lock(&thelock))
>        while (thelock) ;
> 
> This is because the try_lock does a compare-and-swap, which is a writing 
> operation, which will lock the bus.  In the latter version, if the initial 
> (bus-locking) try-lock fails, it will do a read-only (non-bus-locking!) check 
> until the lock appears to be free, then do the safe/locking one again.
> 
> (c) on x86_64, it's safe for spin_lock::unlock() to do a normal unlocked 
> write.  The memory ordering of these chips is such that it doesn't need the 
> memory fences of a full atomic write.  I found numerous sources for this on 
> the net, and it is a big win.  Apparently, it also works on some 32 bit 
> x86's, most of the recent chips, but I am not very interested in sorting out 
> which x86 chips it's safe to do it on, especially since anybody heavily 
> threaded enough to be concerned with this optimization is on a 64 bit system.
> 
> OK, so let me know!  If it works, it's a performance improvement as well as a 
> way for us to shed the pesky TBB dependency.  Win, win.
> 
> 
> 
> 
> On Apr 3, 2012, at 10:34 AM, Larry Gritz wrote:
> 
>> I would very much appreciate if people could grab the code from this pull 
>> request, and run:
>> 
>>      build/ARCH/libOpenImageIO/atomic_test
>>      build/ARCH/libOpenImageIO/spinlock_test
>> 
>> and report the timings after a fresh compile with USE_TBB ON, and again with 
>> USE_TBB OFF.  (On Linux and OSX, if you prefer running from 'make', then 
>> just run 'make nuke ; make USE_TBB=1' or 'make nuke ; make USE_TBB=0'.  On 
>> Windows, or if you prefer otherwise, just set the CMake variable USE_TBB to 
>> ON or OFF, respectively.)
>> 
>> I'm especially keen to hear the results from people on Windows, as that is 
>> the major platform that I have no way to test myself.
>> 
>> If this makes it appear that there is no speed penalty from falling back on 
>> the gcc & win32 intrinsics, compared to TBB, then I think we should simplify 
>> our lives by removing TBB use entirely from OIIO.
>> 
>> 
>> 
>> On Apr 3, 2012, at 10:25 AM, Larry Gritz wrote:
>> 
>>> Make atomic_test and spinlock_test run for many more iterations, and time 
>>> their results.
>>> 
>>> This allows us to rigorously compare the speed of our atomics and spin 
>>> locks.
>>> Also make their output a little neater by locking around the status 
>>> printouts, and
>>> eliminate the #if's that provide safety for Boost < 1.35, which is no 
>>> longer supported.
>>> 
>>> You can merge this Pull Request by running:
>>> 
>>> git pull https://github.com/lgritz/oiio lg-atomic
>>> 
>>> Or you can view, comment on it, or merge it online at:
>>> 
>>> https://github.com/OpenImageIO/oiio/pull/273
>>> 
>>> -- Commit Summary --
>>> 
>>> * Make atomic_test and spinlock_test run for many more iterations, and time 
>>> their results.
>>> 
>>> -- File Changes --
>>> 
>>> M src/libOpenImageIO/atomic_test.cpp (38)
>>> M src/libOpenImageIO/spinlock_test.cpp (29)
>>> 
>>> -- Patch Links --
>>> 
>>> https://github.com/OpenImageIO/oiio/pull/273.patch
>>> https://github.com/OpenImageIO/oiio/pull/273.diff
>>> 
>>> --- 
>>> Reply to this email directly or view it on GitHub:
>>> https://github.com/OpenImageIO/oiio/pull/273
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> [email protected]
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>> 
>> --
>> Larry Gritz
>> [email protected]
>> 
>> 
>> _______________________________________________
>> Oiio-dev mailing list
>> [email protected]
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
> 
> --
> Larry Gritz
> [email protected]
> 
> 
> _______________________________________________
> Oiio-dev mailing list
> [email protected]
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

--
Larry Gritz
[email protected]


_______________________________________________
Oiio-dev mailing list
[email protected]
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Reply via email to