Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version

Charles-François Natali Sat, 19 Oct 2013 07:00:23 -0700

>>> ``get_tracemalloc_memory()`` function:
>>>
>>>     Get the memory usage in bytes of the ``tracemalloc`` module as a
>>>     tuple: ``(size: int, free: int)``.
>>>
>>>     * *size*: total size of bytes allocated by the module,
>>>       including *free* bytes
>>>     * *free*: number of free bytes available to store data
>>
>> What's *free* exactly? I assume it's linked to the internal storage
>> area used by tracemalloc itself, but that's not clear at all.
>>
>> Also, is the tracemalloc overhead included in the above stats (I'm
>> mainly thinking about get_stats() and get_traced_memory()?
>> If yes, I find it somewhat confusing: for example, AFAICT, valgrind's
>> memcheck doesn't report the memory overhead, although it can be quite
>> large, simply because it's not interesting.
>
> My goal is to able to explain how *every* byte is allocated in Python.
> If you enable tracemalloc, your RSS memory will double, or something
> like that. You can use get_tracemalloc_memory() to add metrics to a
> snapshot. It helps to understand how the RSS memory evolves.
>
> Basically, get_tracemalloc_size() is the memory used to store traces.
> It's something internal to the C module (_tracemalloc). This memory is
> not traced because it *is* the traces... and so is not counted in
> get_traced_memory().
>
> The issue is probably the name (or maybe also the doc): would you
> prefer get_python_memory() / get_traces_memory() names, instead of
> get_traced_memory() / get_tracemalloc_memory()?


No, the names are fine as-is.

> FYI Objects allocated in tracemalloc.py (real objects, not traces) are
> not counted in get_traced_memory() because of a filter set up by
> default (it was not the case in previous versions of the PEP). You can
> remove the filter using tracemalloc.clear_filters() to see this
> memory. There are two exceptions: Python objects created for the
> result of get_traces() and get_stats() are never traced for
> efficiency. It *is* possible to trace these objects, but it's really
> too slow. get_traces() and get_stats() may be called outside
> tracemalloc.py, so another filter would be needed. Well, it's easier
> to never trace these objects. Anyway, they are not interesting to
> understand where your application leaks memory.

Perfect, that's all I wanted to know.

> get_object_trace(obj) is a shortcut for
> get_trace(get_object_address(obj)). I agree that the wrong size
> information can be surprising.
>
> I can delete get_object_trace(), or rename the function to
> get_object_traceback() and modify it to only return the traceback.
>
> I prefer to keep the function (modified for get_object_traceback).
> tracemalloc can be combined with other tools like Melia, Heapy or
> objgraph to combine information. When you find an interesting object
> with these tools, you may be interested to know where it was
> allocated.

If you mean modify it to return only the trace, then that's fine.
As for the name, traceback does indeed sound less confusing than
trace, but we should just make sure that the names are consistent
across the API (i.e. always use "trace" or "always use "traceback",
not both).

>>> ``get_trace(address)`` function:
>>>
>>>     Get the trace of a memory block as a ``(size: int, traceback)``
>>>     tuple where *traceback* is a tuple of ``(filename: str, lineno:
>>>     int)`` tuples, *filename* and *lineno* can be ``None``.
>>>
>>>     Return ``None`` if the ``tracemalloc`` module did not trace the
>>>     allocation of the memory block.
>>>
>>>     See also ``get_object_trace()``, ``get_stats()`` and
>>>     ``get_traces()`` functions.
>>
>> Do you have example use cases where you want to work with a raw addresses?
>
> An address is the unique key to identify a memory block. In Python,
> you don't manipulate directly memory blocks, that's why you have a
> get_object_address() function (link objects to traces).
>
> I added get_trace() because get_traces() is very slow. It would be
> stupid to call it if you only need one trace of a memory block.
>
> I'm not sure that this function is really useful. I added it to
> workaround the performance issue, and because I believe that someone
> will need it later :-)
>
> What do you suggest for this function?

Well, I can certainly find a use-case for get_object_trace(): even if
it uses get_trace() internally, I'm not convinced that the later is
useful.
If we cannot come up with a use case for working with raw addresses,
I'm tempted to just keep get_object_trace() public, and make
get_object_address() and get_trace() private.
In short, don't make any address-manipulating function public.

>> Are those ``match`` methods really necessary for the end user, i.e.
>> are they worth being exposed as part of the public API?
>
> (Oh, I just realized that match_lineno() and may lead to bugs, I removed it.)
>
> Initially, I exposed the methods for unit tests. Later, I used them in
> Snapshot.apply_filters() to factorize the code (before I add 2
> implementations to match a filter, one in C, another in Python).
>
> I see tracemalloc more as a library, I don't know yet how it will be
> used by new tools based on it. Snapshot is more an helper (convinient
> class) than a mandatory API to use tracemalloc. You might want to use
> directly filters to analyze raw datas.
>
> Users are supposed to use tracemalloc.add_filter() and
> Snapshot.apply_filters(). You prefer to keep them private and not
> document them? I don't have a strong opinion on this point.

IIUC, you only use those match methods for tests and internally for
code factorization: IMO, that's a hint they shouldn't be made public.

I usually follow a simple rule of thumb for APIs: if you can't come up
with a good use case for something, it shouldn't be made public: it
leads to simpler, streamlined APIs, much more pleasant to learn and
use. Also, one can always add a new method to an API, but it's
impossible to remove from it without breaking backward compatibility.
So I'd say leave them out for now, and we'll see with time if they're
really necessary.

>>> StatsDiff
>>> ---------
>
> Well, StatsDiff is useless :-) I just removed it.

Great, that's also what I thought.

> I modified GroupedStats.compare_to() to sort differences by default,
> but I added a sort parameter to get the list unsorted. sort=False can
> be used to sort differences differently (sorting the list twice would
> be inefficient).

OK, that's a good compromise.

> Another option would be add sort_key and sort_reverse parameters.

IMO that's overkill: the user can use sort() on the list if he wants a
special-purpose order.

>>> Snapshot
>>> --------
>>>
>>> ``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats:
>>> dict=None)`` class:
>>>
>>>     Snapshot of traces and statistics on memory blocks allocated by Python.
>>
>>
>> I'm confused.
>> Why are get_trace(), get_object_trace(), get_stats() etc not methods
>> of a Snapshot object?
>
> get_stats() returns the current stats. If you call it twice, you get
> different results. The principe of a snapshot is to be frozen: stats,
> traces and metrics are read once when the snapshot was created.
>
> To get stats of a snapshot, just read its stats attribute. To get a
> trace, it's snapshot.traces[address].
>
>> Is it because you don't store all the necessary information in a
>> snapshot, or are they just some sort of shorthands, like:
>> stats = get_stats()
>> vs
>> snapshot = Snapshot.create()
>> stats = snapshot.stats
>
> I already used other tools like Melia and Heapy, and it's convinient
> to get access to raw data to compute manually my own view. I don't
> want to force users to use the high-level API (Snapshot).
>
> Is it a problem to have two API (low-level like get_stats() and
> high-level like Snapshot) for similar use cases? What do you suggest?

I didn't understand your above explanation: could get_stats() be
implemented atop a snapshot, or not?

> => Done, I renamed Snapshot.write() to Snapshot.dump().
>
> By the way, load() and dump() are limited to filenames (string).
> Should they accept file-like object? isinstance(filename, str) may be
> used to check if the parameter is a filename or a open file object.

IMO, they're fine as-is.

>>> Metric
>>> ------
>>>
>>> ``Metric(name: str, value: int, format: str)`` class:
>>>
>>>     Value of a metric when a snapshot is created.
>>
>> Alright, what's a metric again ;-) ?
>>
>> I don't know if it's customary, but having short examples would IMO be nice.
>
> => done, I improved the doc

Humm...
I didn't see any metric example at
http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html
Is it me?

cf
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version

Reply via email to