Re: [Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Chris Withers Wed, 09 Sep 2009 05:48:26 -0700

Sverker Nilsson wrote:

But I don't think I would want to risk breaking someone's code just for
this when we could just add a new method.


I don't think anyone will be relying on StopIteration being raised.

If you're worried, do the next release as a 0.10.0 release and explainthe backwards incompatible change in the release announcement.

Or we could have an option to hpy() to redefine load() as loadall(), but
I think it is cleaner (and easier) to just define a new method...

-1 to options to hpy, +1 to loadall but also -1 to lead load() as brokenas it is...

As the enclosing class or frame is deallocated, so is its attribute h

itself.

Right, but as long as the h hangs around, it hangs on to all the memoryit's used to build its stats, right? This caused me problems in my mostrecent use of guppy...

themselves, but I am talking about more severe data that can be hundreds
of megabytes or more).

Me too ;-) I've been profiling situations where the memory usage wasover 1GB for processing a 30MB file when I started ;-)

For example, the setref() method sets a reference point somewhere in h.
Further calls to heap() would report only objects allocated after that
call. But you could use a new hpy() instance to see all objects again.

Multiple threads come to mind, where each thread would have its own
hpy() object. (Thread safety may still be a problem but at least it
should be improved by not sharing the hpy() structures.)

Even in the absence of multiple threads, you might have an outer
invocation of hpy() that is used for global analysis, with its specific
options, setref()'s etc, and inner invocations that make some local
analysis perhaps in a single method.


Fair points :-)

http://guppy-pe.sourceforge.net/heapy-thesis.pdf

I'm afraid, while I'd love to, I don't have the time to read a thesis...

But it is (an important) part of the documentation.

That may be, but I'd wager a fair amount of beer that buy far the mostcommon uses for heapy are:


- finding out what's using the memory consumed by a python process

- log how what the memory consumption is made up of while running alarge python process


- finding out how much memory is being used

...in that order. Usually on a very tight deadline and with unhappyusers breathing down their necks. At times like that, reading a thesisdoesn't really figure into it ;-)

I'm afraid, while I'd love to, I don't have the time to duplicate the
thesis here...;-)

I don't think that would help. Succinct help and easy to use functionsto get those 3 cases above solved is all that's needed ;-)

Do you mean we should actually _remove_ features to create a new
standalone system?


Absolutely, why provide more than is used or needed?

You are free to wrap functions as you find suitable; a minimal wrapper
module could be just like this:

# Module heapyheap
from guppy import hpy
h=hpy()
heap=heap()

I don't follow this.. did you mean heap = h.heap()? If so, isn't thatusing all the gubbinz in Use, etc, anyway?

Less minor rant: this applies to most things to do with heapy... Having__repr__ return the same as __str__ and having that be a long lump oftext is rather annoying. If you really must, make __str__ return the biglump of text but have __repr__ return a simple, short, item containingthe class, the id, and maybe the number of contained objects...
I thought it was cool to not have to use print but get the result
directly at the prompt.
That's fine, that's what __str__ is for. __repr__ should be short.
No, it's the other way around: __repr__ is used when evaluating directly
at the prompt.


The docs give the idea:

http://docs.python.org/reference/datamodel.html?highlight=__repr__#object.__repr__

I believe you "big strings" would be classed as "informal" and so wouldbe computed by __str__.

Yeah, but an item in a set is not a set. __getitem__ should return anitem, not a subset...
Usually I think it is called an 'element' of a set rather than an
'item'. Python builtin sets can't even do indexing at all.


...'cos it doesn't make sense ;-)

Likewise, Heapy IdentitySet objects don't support indexing to get at the

elements directly.


...then they shouldn't have a __getitem__ method!

The index (__getitem__) method was available so I
used it to take the subset of the i'ths row in the partition defined by
its equivalence order.

That should have another name... I don't know what a partition orequivalence order are in the contexts you're using them, but I do knowthat hijacking __getitem__ for this is wrong.

The subset indexing, being the more well-defined operation, and also
IMHO more generally useful, thus got the honor to have the [] syntax.

Except it misleads anyone who's programmed in Python for a significantperiod of time and causes problems when combined with the bug in .load :-(

It would just be another syntax. I don't see the conceptual problem
since e.g. indexing works just fine like this with strings.


Strings are a bad example...

objects. Each row is still an IdentitySet, and has the same attributes.
Why? It's semantically different.


No, it's semantically identical. :-)

Each row is an IdentitySet just like the top level set, but one which
happens to contain elements being of one particular kind as defined by
the equivalence relation in use. So it has only 1 row. The equivalence
relation can be changed by creating a new set by using some of
the .byxxx attribute: then the set could be made to contain many kinds
of objects again, getting more rows albeit the objects themselves don't
change.

Fine, I'll stop arguing, but just be aware that this is confusing andyou're likely the only person who understands what's really going on orhow it's supposed to work...


Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
           - http://www.simplistix.co.uk
--
http://mail.python.org/mailman/listinfo/python-list

Re: [Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Reply via email to