Hi Mikhail, regarding Robert custom new delete advice - it might be good idea to wrap in new/delete such allocators as http://www.canonware.com/jemalloc/
( in Blender it's introduction radically solved most of memory allocation problems ) that might solve not only this particular problem but overall improvement of memory allocation across scene graph and could be relatively cheap to code Regards Sergey On Mon, Jun 30, 2014 at 9:36 PM, Sebastian Messerschmidt < [email protected]> wrote: > Hi Mikhail, > > Actually I would like to see some performance numbers too before believing > that adding a mutex at the low level actually will improve anything. > Usually when using multicore implementations applying mutexes at such low > level result in slower code, since threads are waiting on mutexes which > will increase the chance of stalling. > Allocations are usually extremely fast, even with multiple threads, even > with crappy implementations. So please provide some benchmarking here, to > convince Robert. > > Cheers > Sebastian > > >> 30.06.2014 14:58, Robert Osfield пишет: >> >>> What performance profiling have you done and what type of models? What >>> scale of performance improvement are you seeing? >>> >> Ok, just got fresh numbers for my case. >> 10 frames rendered, allocations count in render thread: >> | total | related to StateGraph | >> original | 32039 | 21440(67%) | >> with cache | 17677 | 3042(17%) | >> >> I'm getting that numbers from real app, without any modifications (using >> xperf for accounting allocation count and windbg to count 10 frames), so >> I'm not able to make absolute numbers be same (I mean 32039-17677 is not >> equal 21440-3042), but relative numbers can be compared. >> >> That is only allocation count improved, but you can ask me what about >> speed improvements? >> In case when no one want to do lot of allocations in same time - there is >> no speed improvement, heap fast enough. But in case when we have other >> threads where lot of heap allocations, then again we are depending on heap >> implementation, if it can handle allocations from different threads fast >> enough then no speed improvement. >> But as you can see we are need 2k allocations per frame, so when heap >> would be slower on each allocation than usual then overall slowdown can be >> significant. >> >> I've just done a review and I feel the extra complexity and the >>> introduction of mutexes aren't something I'm happy with as a solution. >>> >> Yep, extra complexity required because we are trying to handle part of >> heap job yourself. >> I think it is possible to make this feature configurable by compile time >> or runtime. >> >> >> Thanks, >> Mikhail. >> _______________________________________________ >> osg-submissions mailing list >> [email protected] >> http://lists.openscenegraph.org/listinfo.cgi/osg- >> submissions-openscenegraph.org >> > > _______________________________________________ > osg-submissions mailing list > [email protected] > http://lists.openscenegraph.org/listinfo.cgi/osg- > submissions-openscenegraph.org >
_______________________________________________ osg-submissions mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
