On Sat, Oct 5, 2013 at 5:03 AM, Joachim Durchholz <[email protected]> wrote: > Am 04.10.2013 19:37, schrieb Ondřej Čertík: > >> On Thu, Oct 3, 2013 at 12:40 PM, Joachim Durchholz <[email protected]> >> wrote: >>> >>> Am 03.10.2013 17:04, schrieb Ondřej Čertík: >>>> >>>> >>>> On Thu, Oct 3, 2013 at 1:50 AM, Joachim Durchholz <[email protected]> >>>> wrote: >>>>> >>>>> >>>>> Am 03.10.2013 07:46, schrieb Ondřej Čertík: >>>> >>>> >>>> [...] >>>>>>> >>>>>>> >>>>>>> Garbage management is a real issue in C++. >>>>>> >>>>>> >>>>>> >>>>>> Currently I use reference counting. I don't see any issue --- can you >>>>>> elaborate >>>>>> on what you mean? >>>>> >>>>> >>>>> >>>>> See http://www.hpl.hp.com/personal/Hans_Boehm/gc/issues.html for issues >>>>> with >>>>> and around the BDW collector. >>>>> >>>>> There's also the seminal Zorn paper "Memory Allocation Costs in Large C >>>>> and >>>>> C++ Programs" [...] >>>> >>>> >>>> >>>> I see, thanks for the pointers. They are a bit old, so maybe things >>>> have improved since then. >>> >>> >>> >>> I think both mark&sweep and reference-counting approaches have improved >>> since that paper, but the relative performance is still the same. >>> >>> RC does have its niches, but the problem is that as applications evolve, >>> they tend to leave the RC-friendly niche and then you need to rewrite all >>> the allocation code because the RC scheme is visible over all of the >>> code. >>> Hmm... okay, you could probably stub out whatever SmartPointer<> class >>> you're using, dropping all the reference counting code, the variables, >>> and >>> the free calls, and see what the performance is. Your code might still be >>> geared towards being RC-friendly instead of mark&sweep-friendly, but it >>> would give at least a preliminary result. >>> >>> In fact it would be interesting to see how the performance figures have >>> evolved since the papers were written. >> >> >> Good ideas --- here is the RCP class that I created for CSymPy: >> >> https://github.com/certik/csympy/blob/master/src/csympy_rcp.h#L59 >> >> $ ./expand2 >> Expanding: ((y + x + z + w)^15 + w)*(y + x + z + w)^15 >> 1217ms >> number of terms: 6272 >> >> >> So I deleted all reference counting and deleting of objects (patch >> here https://gist.github.com/certik/6829650) >> --- but I left the RCP class for now. Just to get an idea. >> >> $ ./expand2 >> Expanding: ((y + x + z + w)^15 + w)*(y + x + z + w)^15 >> 1382ms >> number of terms: 6272 >> >> >> And things got slower... Haha, that was unexpected. Why would that be? > > > None at all... except that benchmarking is difficult :-) > > Did you run your test multiple times?
Yes. I run it about 10 to 15 times in two terminals with and without the feature that I want to measure and I report the best times in both. > A difference for one-off runs can have a multitude of reasons that are > related to the system's state, you need a warm-up phase and you need to > observe the variance. I don't measure variance, only by looking at it. Do you see a problem with reporting the best time? > Observing actual memory usage might also be > interesting, just to get an idea how much of a role memory management > actually plays in expand2. Your RCP class could count memory sizes and > delete requests. Good point, I should play with that. > > Maybe this result indicates that questions of garbage collection are > irrelevant for expand2, possibly because it generates almost no garbage. > OTOH if the program is generating lots of garbage, it may be that the > program spent a lot of time increasing the heap, setting up data structures > and zeroing out memory. > > >> Based on my experience, one always has to try various approaches and >> pick the fastest. >> RC has been pretty robust in my experience, but I will try some GC >> approaches >> as well. I feel there is at least 10% maybe more potential for speedup >> thanks to >> better memory management. > > > As I said - profiling indicates that allocation-intensive programs can spend > 20% of their time in memory management. So a 10% improvement would be quite > good actually. Absolutely. It's 10% this, then 10% hashing function, then 10% (or more!) the right data structure etc., and it all adds up in the end. > > Speed is really hard to measure though because there is so much variation. > "Pick the fastest" assumes there is a "fastest" approach - which does not > exist in general, there's always the odd program where one otherwise bad > approach will fly and all others suck. > > I hate that C++ doesn't really allow a copying GC. That approach is the one > where the differences are bound to be significant. > > On http://stackoverflow.com/questions/8251645/generational-gc-source-code > I'm seeing Qish, which does some heavy (but maybe Sympy-compatible?) > restrictions on what C++ is allowed to do, and copying. I just don't know > how optimized Qish is. > That page generally lists some more GC approaches. > > The Boehm collector is still mentioned so it hasn't gone out of favor. It > would be interesting to see how it fares with the non-RC variant of your > code. Ondrej -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy. For more options, visit https://groups.google.com/groups/opt_out.
