Re: [Oiio-dev] [oiio] Make atomic_test and spinlock_test run for many more iterations, and time their results. (#273)

Erich Ocean Thu, 21 Jun 2012 13:56:22 -0700

Hi Larry,

I have no objections, and am happy to see one less dependency, plus the better 
performance.


Best, Erich

On Jun 21, 2012, at 11:05 AM, Larry Gritz wrote:

> Erich, did you have any objections to switching OIIO and OSL to defaulting to 
> USE_TBB=0?  (Or anybody else?)
> 
> I'm not going to rip the TBB code out quite yet, so anybody is still free to 
> compile with USE_TBB=1 if they think they'll get better performance that way 
> or don't trust our locks.
> 
> It's not a slam-dunk win in all circumstances, but mostly what I'm seeing 
> across several platforms is that the our new code (USE_TBB=0) is slightly 
> slower than TBB for a small number of threads, but MUCH faster than TBB for 
> high thread counts.
> 
>       -- lg
> 
> 
> On Jun 14, 2012, at 5:17 PM, Erich Ocean wrote:
> 
>> Hi Larry,
>> 
>> Would this also affect OSL? Or would TBB still be needed there?
>> 
>> Best, Erich
>> 
>> On Jun 14, 2012, at 5:13 PM, Larry Gritz wrote:
>> 
>>> Anybody else want to comment or report timings?
>>> 
>>> Are there any objections to my committing this to the master (i.e. improve 
>>> the non-TBB case, but not yet change the default for USE_TBB)?
>>> 
>>> I'm also still open to changing USE_TBB to default to 0, or even removing 
>>> TBB entirely, if everybody's timings show that non-TBB is not significantly 
>>> worse than TBB across a wide variety of hardware and OS configurations.
>>> 
>>> 
>>> 
>>> 
>>> On Jun 12, 2012, at 12:23 AM, Larry Gritz wrote:
>>> 
>>>> https://github.com/OpenImageIO/oiio/pull/273
>>>> 
>>>> This was from a few months back, I didn't have enough success to push for 
>>>> a commit, but now I have a major update:
>>>> 
>>>> After some weekend reading, I had some ideas for how to improve the spin 
>>>> locks and just pushed an update in which, on my two machines (a Mac OS X 
>>>> laptop, and a 12-core Linux box), my non-TBB spinlock now outperforms the 
>>>> TBB spinlocks (by a significant margin on Linux!).
>>>> 
>>>> Could those of you on this thread please try this out?  Check out the 
>>>> branch for this pull request, do "make nuke ; make USE_TBB=1 ; make test 
>>>> TEST=spinlock" and then the same thing again with USE_TBB=0 (be sure to do 
>>>> a full make nuke first).
>>>> 
>>>> If the rest of you find the same thing -- that the USE_TBB=0 benchmark is 
>>>> no slower than the USE_TBB=1 -- then perhaps we can finally retire TBB.
>>>> 
>>>> For the curious, the three improvements were: 
>>>> 
>>>> (a) exponential backoff for spin lock contention in the same manner that 
>>>> TBB was doing;
>>>> 
>>>> (b) try_lock can spin more efficiency with read-only (non-bus-locking) 
>>>> check until there's a good chance that the lock was released.  In 
>>>> particular, you should not spin like this:
>>>> 
>>>> while (! try_lock(&thelock)) ;
>>>> 
>>>> but instead
>>>> 
>>>> while (! try_lock(&thelock))
>>>>     while (thelock) ;
>>>> 
>>>> This is because the try_lock does a compare-and-swap, which is a writing 
>>>> operation, which will lock the bus.  In the latter version, if the initial 
>>>> (bus-locking) try-lock fails, it will do a read-only (non-bus-locking!) 
>>>> check until the lock appears to be free, then do the safe/locking one 
>>>> again.
>>>> 
>>>> (c) on x86_64, it's safe for spin_lock::unlock() to do a normal unlocked 
>>>> write.  The memory ordering of these chips is such that it doesn't need 
>>>> the memory fences of a full atomic write.  I found numerous sources for 
>>>> this on the net, and it is a big win.  Apparently, it also works on some 
>>>> 32 bit x86's, most of the recent chips, but I am not very interested in 
>>>> sorting out which x86 chips it's safe to do it on, especially since 
>>>> anybody heavily threaded enough to be concerned with this optimization is 
>>>> on a 64 bit system.
>>>> 
>>>> OK, so let me know!  If it works, it's a performance improvement as well 
>>>> as a way for us to shed the pesky TBB dependency.  Win, win.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Apr 3, 2012, at 10:34 AM, Larry Gritz wrote:
>>>> 
>>>>> I would very much appreciate if people could grab the code from this pull 
>>>>> request, and run:
>>>>> 
>>>>>   build/ARCH/libOpenImageIO/atomic_test
>>>>>   build/ARCH/libOpenImageIO/spinlock_test
>>>>> 
>>>>> and report the timings after a fresh compile with USE_TBB ON, and again 
>>>>> with USE_TBB OFF.  (On Linux and OSX, if you prefer running from 'make', 
>>>>> then just run 'make nuke ; make USE_TBB=1' or 'make nuke ; make 
>>>>> USE_TBB=0'.  On Windows, or if you prefer otherwise, just set the CMake 
>>>>> variable USE_TBB to ON or OFF, respectively.)
>>>>> 
>>>>> I'm especially keen to hear the results from people on Windows, as that 
>>>>> is the major platform that I have no way to test myself.
>>>>> 
>>>>> If this makes it appear that there is no speed penalty from falling back 
>>>>> on the gcc & win32 intrinsics, compared to TBB, then I think we should 
>>>>> simplify our lives by removing TBB use entirely from OIIO.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Apr 3, 2012, at 10:25 AM, Larry Gritz wrote:
>>>>> 
>>>>>> Make atomic_test and spinlock_test run for many more iterations, and 
>>>>>> time their results.
>>>>>> 
>>>>>> This allows us to rigorously compare the speed of our atomics and spin 
>>>>>> locks.
>>>>>> Also make their output a little neater by locking around the status 
>>>>>> printouts, and
>>>>>> eliminate the #if's that provide safety for Boost < 1.35, which is no 
>>>>>> longer supported.
>>>>>> 
>>>>>> You can merge this Pull Request by running:
>>>>>> 
>>>>>> git pull https://github.com/lgritz/oiio lg-atomic
>>>>>> 
>>>>>> Or you can view, comment on it, or merge it online at:
>>>>>> 
>>>>>> https://github.com/OpenImageIO/oiio/pull/273
>>>>>> 
>>>>>> -- Commit Summary --
>>>>>> 
>>>>>> * Make atomic_test and spinlock_test run for many more iterations, and 
>>>>>> time their results.
>>>>>> 
>>>>>> -- File Changes --
>>>>>> 
>>>>>> M src/libOpenImageIO/atomic_test.cpp (38)
>>>>>> M src/libOpenImageIO/spinlock_test.cpp (29)
>>>>>> 
>>>>>> -- Patch Links --
>>>>>> 
>>>>>> https://github.com/OpenImageIO/oiio/pull/273.patch
>>>>>> https://github.com/OpenImageIO/oiio/pull/273.diff
>>>>>> 
>>>>>> --- 
>>>>>> Reply to this email directly or view it on GitHub:
>>>>>> https://github.com/OpenImageIO/oiio/pull/273
>>>>>> _______________________________________________
>>>>>> Oiio-dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>>> 
>>>>> --
>>>>> Larry Gritz
>>>>> [email protected]
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Oiio-dev mailing list
>>>>> [email protected]
>>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>> 
>>>> --
>>>> Larry Gritz
>>>> [email protected]
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Oiio-dev mailing list
>>>> [email protected]
>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>> 
>>> --
>>> Larry Gritz
>>> [email protected]
>>> 
>>> 
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> [email protected]
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>> 
>> _______________________________________________
>> Oiio-dev mailing list
>> [email protected]
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>> 
> 
> --
> Larry Gritz
> [email protected]
> 
> 
> _______________________________________________
> Oiio-dev mailing list
> [email protected]
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

_______________________________________________
Oiio-dev mailing list
[email protected]
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Re: [Oiio-dev] [oiio] Make atomic_test and spinlock_test run for many more iterations, and time their results. (#273)

Reply via email to