I started writing this post asking for assistance on using Heapy for
studying my Sage application's memory usage. But I think that a more
important question relates to my use of @parallel('multiprocessing').
So here goes.
Section 1.
On an 8-core machine, I kick off a function (which has been decorated
with @parallel('multiprocessing')) with 16 arguments. (BTW, Sage
outrocks all---the sirens of multicore have given Matlab-addicted
colleagues weeks of impotent rage; me, only a couple of hours of
searching and finding "parallel?". The creators of Sage will have a
special place in heaven.)
Anyway, 16 arguments are kicked to this parallel function (which
creates large vectors and reduces them to a scalar), I see 8 or so
sage processes in "top", but one of them stands out: it's memory
footprint steadily increases in small 1-3% increments to about 12% of
memory. It stays at 12% after the sage script finishes. If I kicked
off 100 parallel runs, I get MemoryErrors due to excessive RAM use
(and Sage doesn't recover---it hangs, and Control-C kicks off
KeyboardErrors but never cleanly goes down, I have to kill the
screen).
But if I decorate my function with @parallel('reference'), that is, to
force serial execution, the Sage process at the end of many parallel
runs only takes 1.5% memory. (It just takes a lot more time.)
Is there an issue with @parallel with multiprocessing and garbage
collection? Why would Sage sop up so much memory after parallel runs
of a function that solely returns a scalar? (The function does take as
its input a small compound object, deepcopy's it, modifies the copy,
generates large vectors from the original and modified objects and
reduces them to a scalar, which it returns. I haven't tried making
Python explicitly garbage-collect the intermediate variables it
generates because that sounds risky.)
Section 2.
In a bid to figure out what Python objects are taking so much RAM, I
installed Guppy and am trying to use Heapy (http://guppy-
pe.sourceforge.net/heapy_tutorial.html). I cannot understand its
output, and "from guppy import hpy; hp=hpy(); hp.heap()" tells me that
my 12% memory usage (of 8 GB RAM) is using just 40 MB of memory. If
anyone knows how to interpret it's output, I'd be much obliged:
sage: hp.heap()
Partition of a set of 390118 objects. Total size = 42345660 bytes.
Index Count % Size % Cumulative % Kind (class / dict of
class)
0 168567 43 24633284 58 24633284 58 str
1 93441 24 3740796 9 28374080 67 tuple
2 1593 0 2051016 5 30425096 72 dict of module
3 14869 4 2022184 5 32447280 77 dict of
numpy.core.defmatrix.matrix
4 25218 6 1714824 4 34162104 81 types.CodeType
5 24449 6 1369144 3 35531248 84 function
6 2518 1 1281712 3 36812960 87 dict of type
7 2301 1 1167912 3 37980872 90 dict (no owner)
8 2520 1 1094184 3 39075056 92 type
9 14869 4 832664 2 39907720 94
numpy.core.defmatrix.matrix
<839 more rows. Type e.g. '_.more' to view.>
Thanks very much!
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to [email protected]
To unsubscribe from this group, send an email to
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---