Sverker Nilsson wrote:
But I don't think I would want to risk breaking someone's code just for
this when we could just add a new method.

I don't think anyone will be relying on StopIteration being raised.
If you're worried, do the next release as a 0.10.0 release and explain the backwards incompatible change in the release announcement.

Or we could have an option to hpy() to redefine load() as loadall(), but
I think it is cleaner (and easier) to just define a new method...

-1 to options to hpy, +1 to loadall but also -1 to lead load() as broken as it is...

As the enclosing class or frame is deallocated, so is its attribute h
itself.

Right, but as long as the h hangs around, it hangs on to all the memory it's used to build its stats, right? This caused me problems in my most recent use of guppy...

themselves, but I am talking about more severe data that can be hundreds
of megabytes or more).

Me too ;-) I've been profiling situations where the memory usage was over 1GB for processing a 30MB file when I started ;-)

For example, the setref() method sets a reference point somewhere in h.
Further calls to heap() would report only objects allocated after that
call. But you could use a new hpy() instance to see all objects again.

Multiple threads come to mind, where each thread would have its own
hpy() object. (Thread safety may still be a problem but at least it
should be improved by not sharing the hpy() structures.)

Even in the absence of multiple threads, you might have an outer
invocation of hpy() that is used for global analysis, with its specific
options, setref()'s etc, and inner invocations that make some local
analysis perhaps in a single method.

Fair points :-)

http://guppy-pe.sourceforge.net/heapy-thesis.pdf
I'm afraid, while I'd love to, I don't have the time to read a thesis...

But it is (an important) part of the documentation.

That may be, but I'd wager a fair amount of beer that buy far the most common uses for heapy are:

- finding out what's using the memory consumed by a python process

- log how what the memory consumption is made up of while running a large python process

- finding out how much memory is being used

...in that order. Usually on a very tight deadline and with unhappy users breathing down their necks. At times like that, reading a thesis doesn't really figure into it ;-)

I'm afraid, while I'd love to, I don't have the time to duplicate the
thesis here...;-)

I don't think that would help. Succinct help and easy to use functions to get those 3 cases above solved is all that's needed ;-)

Do you mean we should actually _remove_ features to create a new
standalone system?

Absolutely, why provide more than is used or needed?

You are free to wrap functions as you find suitable; a minimal wrapper
module could be just like this:

# Module heapyheap
from guppy import hpy
h=hpy()
heap=heap()

I don't follow this.. did you mean heap = h.heap()? If so, isn't that using all the gubbinz in Use, etc, anyway?

Less minor rant: this applies to most things to do with heapy... Having __repr__ return the same as __str__ and having that be a long lump of text is rather annoying. If you really must, make __str__ return the big lump of text but have __repr__ return a simple, short, item containing the class, the id, and maybe the number of contained objects...
I thought it was cool to not have to use print but get the result
directly at the prompt.
That's fine, that's what __str__ is for. __repr__ should be short.

No, it's the other way around: __repr__ is used when evaluating directly
at the prompt.

The docs give the idea:

http://docs.python.org/reference/datamodel.html?highlight=__repr__#object.__repr__

I believe you "big strings" would be classed as "informal" and so would be computed by __str__.


Yeah, but an item in a set is not a set. __getitem__ should return an item, not a subset...

Usually I think it is called an 'element' of a set rather than an
'item'. Python builtin sets can't even do indexing at all.

...'cos it doesn't make sense ;-)

Likewise, Heapy IdentitySet objects don't support indexing to get at the
elements directly.

...then they shouldn't have a __getitem__ method!

The index (__getitem__) method was available so I
used it to take the subset of the i'ths row in the partition defined by
its equivalence order.

That should have another name... I don't know what a partition or equivalence order are in the contexts you're using them, but I do know that hijacking __getitem__ for this is wrong.

The subset indexing, being the more well-defined operation, and also
IMHO more generally useful, thus got the honor to have the [] syntax.

Except it misleads anyone who's programmed in Python for a significant period of time and causes problems when combined with the bug in .load :-(

It would just be another syntax. I don't see the conceptual problem
since e.g. indexing works just fine like this with strings.

Strings are a bad example...

objects. Each row is still an IdentitySet, and has the same attributes.
Why? It's semantically different.

No, it's semantically identical. :-)

Each row is an IdentitySet just like the top level set, but one which
happens to contain elements being of one particular kind as defined by
the equivalence relation in use. So it has only 1 row. The equivalence
relation can be changed by creating a new set by using some of
the .byxxx attribute: then the set could be made to contain many kinds
of objects again, getting more rows albeit the objects themselves don't
change.

Fine, I'll stop arguing, but just be aware that this is confusing and you're likely the only person who understands what's really going on or how it's supposed to work...

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
           - http://www.simplistix.co.uk
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to