[Python-ideas] Re: Resetting peak memory metric in tracemalloc without touching other traces

Victor Stinner Thu, 14 May 2020 08:33:05 -0700

Hi,

The following function is completely reasonable. It shouldn't be hard
to implement it (a few lines of C code).


    def reset_peak_memory():
        # in _tracemalloc.c
        tracemalloc_peak_trace_memory = tracemalloc_traced_memory;

Reset the peak to tracemalloc_traced_memory is correct :-)

Can you please open an issue at https://bugs.python.org/ to request
the feature? Do you want to implement it?

Put me (vstinner) in the nosy list of the issue. I wrote tracemalloc
and so could help you to implement the feature ;-)

Victor

Le jeu. 14 mai 2020 à 15:06, <[email protected]> a écrit :
>
> Hi,
>
> It would be helpful for us if tracemalloc had a function that reset the peak 
> memory usage counter, without clearing the current traces. At the moment, I 
> don't think there's a way to find the peak memory of a subset of the code 
> since the initial tracemalloc.start() call, without calling 
> tracemalloc.clear_traces(). The latter disturbs other parts of the tracing.
>
> Specifically, it might be a function like (pseudo-implementation):
>
>     def reset_peak_memory():
>         # in _tracemalloc.c
>         tracemalloc_peak_trace_memory = tracemalloc_traced_memory;
>
> This would allow easily determining the peak memory usage of a specific piece 
> of code, without disturbing all of the traces. For example, the following 
> would set specific_peak to the highest size of traced memory of just line X:
>
>     tracemalloc.start()
>     # ... code where allocations matter, but the peak does not ...
>     peak_memory_doesnt_matter()
>
>     tracemalloc.reset_peak_memory()
>     peak_memory_is_important() # X
>     _, specific_peak = tracemalloc.get_traced_memory()
>
>     # ... more code with relevant allocations ...
>     peak_memory_doesnt_matter()
>
>     tracemalloc.stop()
>
> As sketched above, the implementation of this should be quite small, with the 
> core being the line mentioned above, plus all the required extras (locking, 
> wrapping, documentation, tests, ...). Thoughts?
>
>
> Full motivation for why we want to do this:
>
> In <https://github.com/stellargraph/stellargraph>, we're using the 
> tracemalloc module to understand the memory usage of our core StellarGraph 
> graph class (a nodes-and-edges graph, not a plot, to be clear). It stores 
> some NumPy arrays of feature vectors associated with each node in the graph, 
> along with all of the edge information. Any of these pieces can be large, and 
> we want to keep the resource usage as small as possible. We're monitoring 
> this by instrumenting the construction: start from a raw set of nodes 
> (including potentially large amounts of features) and edges, and build a 
> StellarGraph object, recording some metrics:
>
> 1. the time
> 2. the total memory usage of the graph instance
> 3. the additional memory usage, that's not shared with the raw data (in 
> particular, if the raw data is 1GB, it's useful to know whether a 1.5GB graph 
> instance consists of 0.5GB of new memory, or 1.5GB of new memory)
> 4. the peak memory usage during construction
>
> 2, 3 and 4 we record using a combination of tracemalloc.take_snapshot() and 
> tracemalloc.get_traced_memory(), something like:
>
>     def diff(after, before): return sum(elem.size_diff for 
> after.compare_to(before, "lineno"))
>
>     snap_start = take_snapshot()
>
>     raw = load_data_from_disk()
>     snap_raw = take_snapshot()
>
>     # X
>
>     graph = create_graph(raw)
>     snap_raw_graph = take_snapshot()
>     _, mem_peak = get_traced_memory() # 4
>
>     del raw
>     snap_graph = take_snapshot()
>
>     mem_raw = diff(snap_raw, snap_start) # baseline
>     mem_graph = diff(snap_graph, snap_start) # 2
>     mem_graph_not_shared = diff(snap_raw_graph, snap_raw) # 3
>
> ('measure_memory' in 
> <https://nbviewer.jupyter.org/github/stellargraph/stellargraph/blob/93fce46166645dd0d1ca2ea2862b68355826e3fc/demos/zzz-internal-developers/graph-resource-usage.ipynb#Measurement>
>  has all the gory details.)
>
> Unfortunately, we want to ignore any peak during data loading: the peak 
> during create_graph is all we care about, even if the overall peak (in data 
> loading) is higher. That is, we want to only consider the peak memory usage 
> after line X. One way to do this would be to call clear_traces() at X, but 
> this invalidates the traces used for the 2 and 3 comparisons. I believe 
> tracemalloc.reset_peak_memory() is the necessary function to call at X. (Why 
> do we want to ignore the peak during data loading? The loading is under the 
> control of a user (of stellargraph) as it's typically done via Pandas or 
> NumPy and those libraries are out of our control and offer a variety of 
> options for tweaking data-loading behavior, whereas the internals of the 
> `StellarGraph` instance are in our control and not as configurable by users.)
>
> Thanks,
> Huon Wilson
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/[email protected]/message/QDWI37A4TJXOYUKULGPY2GKD7IG2JNDC/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/T5XIRL4HTW57KM4RWHR67KJTHYF76U2D/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Resetting peak memory metric in tracemalloc without touching other traces

Reply via email to