Hi Robert,

I had deep look into windows heap allocation, it uses lock free structures and it is extremely fast.

I hope this submission take not a lot of your time.

Thanks,
Mikhail.

30.06.2014 19:00, Robert Osfield ?????:
Hi Mikhail,

Actual speed improvements is what I care about, not theoretical impact of number of heap allocations.

As for making your proposed an optional option, this only complicates the code further.

Right now I have *no* evidence that the modification is actually required from a performance standpoint. The code adds complexity and will be harder to maintain and debug. For any extra complexity I set the bar higher for justifying it's inclusion - it has have a real and measurable benefit to justify the extra cost for managing the code.

If you care critically about heap allocations then you can always override new/delete and provide your own custom scheme.

The other aspect you can look at to avoid all these allocations is to have a scene graph that is less fine grained - if you are CPU limited due to draw dispatch then there is good chance that just building the scene graph is a different way will address it.

When you have exhausted all these options and proven the case that this bottleneck is a real issue then we can come back and look at caching StateGraph objects.

Robert.


On 30 June 2014 15:36, Mikhail Izmestev <[email protected] <mailto:[email protected]>> wrote:


    30.06.2014 14:58, Robert Osfield ?????:

        What performance profiling have you done and what type of
        models? What scale of performance improvement are you seeing?

    Ok, just got fresh numbers for my case.
    10 frames rendered, allocations count in render thread:
    | total | related to StateGraph |
    original | 32039 | 21440(67%) |
    with cache | 17677 | 3042(17%) |

    I'm getting that numbers from real app, without any modifications
    (using xperf for accounting allocation count and windbg to count
    10 frames), so I'm not able to make absolute numbers be same (I
    mean 32039-17677 is not equal 21440-3042), but relative numbers
    can be compared.

    That is only allocation count improved, but you can ask me what
    about speed improvements?
    In case when no one want to do lot of allocations in same time -
    there is no speed improvement, heap fast enough. But in case when
    we have other threads where lot of heap allocations, then again we
    are depending on heap implementation, if it can handle allocations
    from different threads fast enough then no speed improvement.
    But as you can see we are need 2k allocations per frame, so when
    heap would be slower on each allocation than usual then overall
    slowdown can be significant.


        I've just done a review and I feel the extra complexity and
        the introduction of mutexes aren't something I'm happy with as
        a solution.

    Yep, extra complexity required because we are trying to handle
    part of heap job yourself.
    I think it is possible to make this feature configurable by
    compile time or runtime.



    Thanks,
    Mikhail.
    _______________________________________________
    osg-submissions mailing list
    [email protected]
    <mailto:[email protected]>
    
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org




_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org

_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org

Reply via email to