On Sat, Oct 5, 2013 at 5:03 AM, Joachim Durchholz <[email protected]> wrote:
> Am 04.10.2013 19:37, schrieb Ondřej Čertík:
>
>> On Thu, Oct 3, 2013 at 12:40 PM, Joachim Durchholz <[email protected]>
>> wrote:
>>>
>>> Am 03.10.2013 17:04, schrieb Ondřej Čertík:
>>>>
>>>>
>>>> On Thu, Oct 3, 2013 at 1:50 AM, Joachim Durchholz <[email protected]>
>>>> wrote:
>>>>>
>>>>>
>>>>> Am 03.10.2013 07:46, schrieb Ondřej Čertík:
>>>>
>>>>
>>>> [...]
>>>>>>>
>>>>>>>
>>>>>>> Garbage management is a real issue in C++.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Currently I use reference counting. I don't see any issue --- can you
>>>>>> elaborate
>>>>>> on what you mean?
>>>>>
>>>>>
>>>>>
>>>>> See http://www.hpl.hp.com/personal/Hans_Boehm/gc/issues.html for issues
>>>>> with
>>>>> and around the BDW collector.
>>>>>
>>>>> There's also the seminal Zorn paper "Memory Allocation Costs in Large C
>>>>> and
>>>>> C++ Programs" [...]
>>>>
>>>>
>>>>
>>>> I see, thanks for the pointers. They are a bit old, so maybe things
>>>> have improved since then.
>>>
>>>
>>>
>>> I think both mark&sweep and reference-counting approaches have improved
>>> since that paper, but the relative performance is still the same.
>>>
>>> RC does have its niches, but the problem is that as applications evolve,
>>> they tend to leave the RC-friendly niche and then you need to rewrite all
>>> the allocation code because the RC scheme is visible over all of the
>>> code.
>>> Hmm... okay, you could probably stub out whatever SmartPointer<> class
>>> you're using, dropping all the reference counting code, the variables,
>>> and
>>> the free calls, and see what the performance is. Your code might still be
>>> geared towards being RC-friendly instead of mark&sweep-friendly, but it
>>> would give at least a preliminary result.
>>>
>>> In fact it would be interesting to see how the performance figures have
>>> evolved since the papers were written.
>>
>>
>> Good ideas --- here is the RCP class that I created for CSymPy:
>>
>> https://github.com/certik/csympy/blob/master/src/csympy_rcp.h#L59
>>
>> $ ./expand2
>> Expanding: ((y + x + z + w)^15 + w)*(y + x + z + w)^15
>> 1217ms
>> number of terms: 6272
>>
>>
>> So I deleted all reference counting and deleting of objects (patch
>> here https://gist.github.com/certik/6829650)
>> --- but I left the RCP class for now. Just to get an idea.
>>
>> $ ./expand2
>> Expanding: ((y + x + z + w)^15 + w)*(y + x + z + w)^15
>> 1382ms
>> number of terms: 6272
>>
>>
>> And things got slower... Haha, that was unexpected. Why would that be?
>
>
> None at all... except that benchmarking is difficult :-)
>
> Did you run your test multiple times?

Yes. I run it about 10 to 15 times in two terminals with and without the feature
that I want to measure and I report the best times in both.

> A difference for one-off runs can have a multitude of reasons that are
> related to the system's state, you need a warm-up phase and you need to
> observe the variance.

I don't measure variance, only by looking at it. Do you see a problem
with reporting the best time?

> Observing actual memory usage might also be
> interesting, just to get an idea how much of a role memory management
> actually plays in expand2. Your RCP class could count memory sizes and
> delete requests.

Good point, I should play with that.

>
> Maybe this result indicates that questions of garbage collection are
> irrelevant for expand2, possibly because it generates almost no garbage.
> OTOH if the program is generating lots of garbage, it may be that the
> program spent a lot of time increasing the heap, setting up data structures
> and zeroing out memory.
>
>
>> Based on my experience, one always has to try various approaches and
>> pick the fastest.
>> RC has been pretty robust in my experience, but I will try some GC
>> approaches
>> as well. I feel there is at least 10% maybe more potential for speedup
>> thanks to
>> better memory management.
>
>
> As I said - profiling indicates that allocation-intensive programs can spend
> 20% of their time in memory management. So a 10% improvement would be quite
> good actually.

Absolutely. It's 10% this, then 10% hashing function, then 10% (or
more!) the right data structure
etc., and it all adds up in the end.

>
> Speed is really hard to measure though because there is so much variation.
> "Pick the fastest" assumes there is a "fastest" approach - which does not
> exist in general, there's always the odd program where one otherwise bad
> approach will fly and all others suck.
>
> I hate that C++ doesn't really allow a copying GC. That approach is the one
> where the differences are bound to be significant.
>
> On http://stackoverflow.com/questions/8251645/generational-gc-source-code
> I'm seeing Qish, which does some heavy (but maybe Sympy-compatible?)
> restrictions on what C++ is allowed to do, and copying. I just don't know
> how optimized Qish is.
> That page generally lists some more GC approaches.
>
> The Boehm collector is still mentioned so it hasn't gone out of favor. It
> would be interesting to see how it fares with the non-RC variant of your
> code.

Ondrej

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to