Hi Robert,
I had deep look into windows heap allocation, it uses lock free
structures and it is extremely fast.
I hope this submission take not a lot of your time.
Thanks,
Mikhail.
30.06.2014 19:00, Robert Osfield ?????:
Hi Mikhail,
Actual speed improvements is what I care about, not theoretical impact
of number of heap allocations.
As for making your proposed an optional option, this only complicates
the code further.
Right now I have *no* evidence that the modification is actually
required from a performance standpoint. The code adds complexity and
will be harder to maintain and debug. For any extra complexity I set
the bar higher for justifying it's inclusion - it has have a real and
measurable benefit to justify the extra cost for managing the code.
If you care critically about heap allocations then you can always
override new/delete and provide your own custom scheme.
The other aspect you can look at to avoid all these allocations is to
have a scene graph that is less fine grained - if you are CPU limited
due to draw dispatch then there is good chance that just building the
scene graph is a different way will address it.
When you have exhausted all these options and proven the case that
this bottleneck is a real issue then we can come back and look at
caching StateGraph objects.
Robert.
On 30 June 2014 15:36, Mikhail Izmestev <[email protected]
<mailto:[email protected]>> wrote:
30.06.2014 14:58, Robert Osfield ?????:
What performance profiling have you done and what type of
models? What scale of performance improvement are you seeing?
Ok, just got fresh numbers for my case.
10 frames rendered, allocations count in render thread:
| total | related to StateGraph |
original | 32039 | 21440(67%) |
with cache | 17677 | 3042(17%) |
I'm getting that numbers from real app, without any modifications
(using xperf for accounting allocation count and windbg to count
10 frames), so I'm not able to make absolute numbers be same (I
mean 32039-17677 is not equal 21440-3042), but relative numbers
can be compared.
That is only allocation count improved, but you can ask me what
about speed improvements?
In case when no one want to do lot of allocations in same time -
there is no speed improvement, heap fast enough. But in case when
we have other threads where lot of heap allocations, then again we
are depending on heap implementation, if it can handle allocations
from different threads fast enough then no speed improvement.
But as you can see we are need 2k allocations per frame, so when
heap would be slower on each allocation than usual then overall
slowdown can be significant.
I've just done a review and I feel the extra complexity and
the introduction of mutexes aren't something I'm happy with as
a solution.
Yep, extra complexity required because we are trying to handle
part of heap job yourself.
I think it is possible to make this feature configurable by
compile time or runtime.
Thanks,
Mikhail.
_______________________________________________
osg-submissions mailing list
[email protected]
<mailto:[email protected]>
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org