Re: [Matplotlib-users] Matplotlib eating memory
Michael Droettboom wrote: Sorry to repeat myself, but please reduce this to a short, self contained example, that is absolutely minimal to demonstrate the problem. http://sscce.org/ should help better explain what I'm after. I don't want to find the needle in the haystack here -- there is code in your example that doesn't even run, for example. That said, are you really after creating a legend entry for each of the dots? (See below). That just isn't going to work, and I'm not surprised it eats up excessive amounts of memory. I think you want (and can) reduce this to a single scatter call. _series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns PathCollection object Are you sure? I think it was concluded on this list that scatter cannot (or does not) take nested lists of lists with series like histogram and piechart do. I cannot find the thread but maybe you are more lucky. I even think that I already opened a bugreport/feature requested in the past for this. But maybe not. Martin Mike On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote: Hi, so here is some quick but working example. I added there are 2-3 functions (unused) as a bonus, you can easily call them from the main function using same API (except the piechart). I hope this shows what I lack in matplotlib - a general API so that I could easily switch form scatter plot to piechart or barchart without altering much the function arguments. Messing with return objects line2D, PathCollection, Rectangle is awkward and I would like to stay away from matplotlib's internals. ;) Some can be sliced, so not, you will see in the code. This eatmem.py will take easily all your memory. Drawing 30 dots is not feasible with 16GB of RAM. While the example is for sure inefficient in many places generating the data in python does not eat RAM. That happens afterwards. I would really like to hear whether matplotlib could be adjusted instead. ;) I already mentioned in this thread that it is awkward to pre-create colors before passing all data to a drawing function. I think we could all save a lot if matplotlib could dynamically fetch colors on the fly from user-created generator, same for legends descriptions. I think my example code shows the inefficient approach here. Would I have more time I would randomize a bit more the sublist of each series so that the numbers in legends would be more variable but that is a cosmetic issue. Probably due to my ignorance you will see that figures with legends have different font sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both approaches but failed badly. The files/figures with legends should be just accompanied by the legend table underneath but the drawing itself should be same. Maybe an issue with DPI settings but not only. I placed some comments in the code, please don't take them in person. ;) Of course I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp this ugly code into matplotlib testsuite, provide similar function (the API mentioned above) so that I could use your code directly. That would be great. I just tried to show multiple issues at once, notably that is why I included those unused functions. You will for sure find a way to use them. Regarding the unnecessary del() calls etc., I think I have to use keep some, Ben, because the function is not always left soon enough. I could drop some, you are right, but for some I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference so ... go and test this lousy code. Now you have a testcase. ;) Same with the gc.collect() calls. Actually, the main loop with 10 iteration is there just to show why I always want to clear a figure when entering a function and while leaving it as well. It happened too many times that I drawed over an old figure, and this was posted also few times on this list by others. That is a weird behavior in my opinion. We, users, are just forced to use too low-level functions. So, have fun eating your memory! :)) Martin -- _ |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | http://www.droettboom.com -- Martin Mokrejs, Ph.D. Bioinformatics Donovalska 1658 149 00 Prague Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register
Re: [Matplotlib-users] Matplotlib eating memory
On 10/10/2013 15:05, Martin MOKREJŠ wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). Hello Martin, can I ask what is the meaning of plotting a scatter plot with 200 thousands points in it? Either you visualize it on a screen much larger than mine, or you are not going to be able to distinguish the single data points. Maybe you should rethink the visualization tool you are using. Nevertheless, I'm perfectly able to plot a scatter plot with 262422 data points each with its own color just fine, and the python process consumes a few hundred Mb of ram (having quite a few other datasets loaded in memory):: import numpy as np import matplotlib.pyplot as plt n = 262422 x = np.random.rand(n) y = np.random.rand(n) c = np.random.rand(n) f = plt.figure() a = f.add_subplot(111) a.scatter(x, y, c=c, s=50) plt.show() and a possible solution using exactly 153 different colors, but again, I don't see how you can distinguish between hundreds different shades of colors:: n = 262422 #22 ncolors = 153 x = np.random.rand(n) y = np.random.rand(n) c = np.random.rand(ncolors) f = plt.figure() a = f.add_subplot(111) for i in xrange(n // ncolors): a.scatter(x[i*ncolors:(i+1)*ncolors], y[i*ncolors:(i+1)*ncolors], c=c, s=50) plt.show() Unfortunately the code you provide is too contrived to be useful to understand the root cause of your problem. Cheers, Daniele -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk ___ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Re: [Matplotlib-users] Matplotlib eating memory
Hi, so here is some quick but working example. I added there are 2-3 functions (unused) as a bonus, you can easily call them from the main function using same API (except the piechart). I hope this shows what I lack in matplotlib - a general API so that I could easily switch form scatter plot to piechart or barchart without altering much the function arguments. Messing with return objects line2D, PathCollection, Rectangle is awkward and I would like to stay away from matplotlib's internals. ;) Some can be sliced, so not, you will see in the code. This eatmem.py will take easily all your memory. Drawing 30 dots is not feasible with 16GB of RAM. While the example is for sure inefficient in many places generating the data in python does not eat RAM. That happens afterwards. I would really like to hear whether matplotlib could be adjusted instead. ;) I already mentioned in this thread that it is awkward to pre-create colors before passing all data to a drawing function. I think we could all save a lot if matplotlib could dynamically fetch colors on the fly from user-created generator, same for legends descriptions. I think my example code shows the inefficient approach here. Would I have more time I would randomize a bit more the sublist of each series so that the numbers in legends would be more variable but that is a cosmetic issue. Probably due to my ignorance you will see that figures with legends have different font sizes, axes are rescaled and the figure. Of course I wanted to have the drawing same via both approaches but failed badly. The files/figures with legends should be just accompanied by the legend table underneath but the drawing itself should be same. Maybe an issue with DPI settings but not only. I placed some comments in the code, please don't take them in person. ;) Of course I am glad for the existing work and am happy to contribute my crap. I am fine if you rewamp this ugly code into matplotlib testsuite, provide similar function (the API mentioned above) so that I could use your code directly. That would be great. I just tried to show multiple issues at once, notably that is why I included those unused functions. You will for sure find a way to use them. Regarding the unnecessary del() calls etc., I think I have to use keep some, Ben, because the function is not always left soon enough. I could drop some, you are right, but for some I don't think so. Matplotlib cannot recycle the memory until me (upstream) deletes the reference so ... go and test this lousy code. Now you have a testcase. ;) Same with the gc.collect() calls. Actually, the main loop with 10 iteration is there just to show why I always want to clear a figure when entering a function and while leaving it as well. It happened too many times that I drawed over an old figure, and this was posted also few times on this list by others. That is a weird behavior in my opinion. We, users, are just forced to use too low-level functions. So, have fun eating your memory! :)) Martin #! /usr/bin/env python import sys import gc from textwrap import wrap from itertools import izip, imap, ifilter, chain import numpy as np from math import ceil import colorsys import matplotlib matplotlib.use('Agg') # Force matplotlib not to use any X-windows backend. import pylab matplotlib.use('Agg') from random import uniform, randint, randrange from optparse import OptionParser myversion = 'xxx' myparser = OptionParser(version=%s version %s % ('%prog', myversion)) myparser.add_option(--series-num, action=store, type=int, dest=series_num, default=200, help=Set number of series in the charts. Each series has its own color and legend text.) myparser.add_option(--max-datapoints-per-series, action=store, type=int, dest=max_datapoints_per_series, default=2000, help=Set number of data points to be generated at random. The actual counts will appear in the legend.) (myoptions, myargs) = myparser.parse_args() # convert the view of numpy array to tuple # http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html def generate_color_tuples(cnt, start, stop): _HSV_tuples1 = [] for _n in xrange(1,cnt + 1): _h1 = sorted([uniform(start, stop) for x in xrange(_n)]) _HSV_tuples1 = [(_h1[x], 1.0, 1.0) for x in xrange(_n)] return [colorsys.hsv_to_rgb(*x) for x in _HSV_tuples1] def generate_color_tuples_wrapper(_wanted_length): Generating lots of colors is useless. Try to make a color list using batches of colors. Make 100 different colors and then re-use them to get the final number. if _wanted_length 300: _manageable_length = ( _wanted_length / 100 ) + 1 # round up _short_colors = generate_color_tuples(100, 0.01, 0.95) # 0.01, 0.95) _colors = [] # this way we rotate the color batches several times #for _i in xrange(1,
Re: [Matplotlib-users] Matplotlib eating memory
Can you provide a complete, standalone example that reproduces the problem. Otherwise all I can do is guess. The usual culprit is forgetting to close figures after you're done with them. Mike On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk ___ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users -- _ |\/|o _|_ _. _ | | \.__ __|__|_|_ _ _ ._ _ | ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | http://www.droettboom.com -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk ___ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Re: [Matplotlib-users] Matplotlib eating memory
On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Ben Root -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk___ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Re: [Matplotlib-users] Matplotlib eating memory
Benjamin Root wrote: On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com mailto:mmokr...@gmail.com wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Could there be places where gc.collect() could be introduced? Are there places where matplotlib could del() unnecessary objects right away? I think the problem is with huge lists or pythonic dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely a dict and that is the same issue. Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of dots, of course. Thanks, Martin -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk ___ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Re: [Matplotlib-users] Matplotlib eating memory
On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ mmokr...@gmail.com wrote: Benjamin Root wrote: On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.commailto: mmokr...@gmail.com wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Could there be places where gc.collect() could be introduced? Are there places where matplotlib could del() unnecessary objects right away? I think the problem is with huge lists or pythonic dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely a dict and that is the same issue. Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of dots, of course. I am not going to claim that matplotlib is the most lean graphing library out there, and we already do know where we can make continued improvements, but the symptom you are describing (50 GB for a couple hundred thousand scatter points) is just unheard of for matplotlib. Without a simple, concise, complete code example to demonstrate your problem, we can only hazard guesses. For all I know, you might be appending to numpy arrays in a loop prior to plotting, which would eat up significant amount of memory without it being the fault of matplotlib. As far as I am aware, we don't do very large dictionaries, so I am doubtful
Re: [Matplotlib-users] Matplotlib eating memory
Michael Droettboom wrote: Can you provide a complete, standalone example that reproduces the problem. Otherwise all I can do is guess. The usual culprit is forgetting to close figures after you're done with them. Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning message some weeks ago. Yes, i do call _figure.clear() and pylab.clf() but only after the savefig() returns, which is not the case here. Also use gc.collect() a lot through the code, especially before and after I draw every figure. That is not enough here. from itertools import izip, imap, ifilter import pylab import matplotlib # Force matplotlib not to use any X-windows backend. matplotlib.use('Agg') import pylab F = pylab.gcf() # convert the view of numpy array to tuple # http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html DefaultSize = tuple(F.get_size_inches()) def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, xlabel_data, ylabel_data, legends, legend_loc='upper right', legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, tight_layout=False, legend_inside=False, objsize=0.1): # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None **kwargs) if len(mydata_x) != len(mydata_y): raise ValueError, %s: len(mydata_x) != len(mydata_y): %s != %s % (filename, len(mydata_x), len(mydata_y)) if colors and len(mydata_x) != len(colors): sys.stderr.write(Warning: draw_hist2d_plot(): %s: len(mydata_x) != len(colors): %s != %s.\n % (filename, len(mydata_x), len(colors))) if colors and legends and len(colors) != len(legends): sys.stderr.write(Warning: draw_hist2d_plot(): %s, len(colors) != len(legends): %s != %s.\n % (filename, len(colors), len(legends))) if mydata_x and mydata_y and filename: if legends: if not legend_ncol: _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, fontsize=legend_fontsize) else: _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, legend_ncol else: _subfigs, _ax1_num, _legend_ncol = 3, 313, 0 set_my_pylab_defaults() pylab.clf() _figure = pylab.figure() _figure.clear() _figure.set_tight_layout(True) gc.collect() if legends: # do not crash on too tall figures if 8.4 * _subfigs 200: _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1)) else: # _figure.set_size_inches() silently accepts a large value but later on _figure.savefig() crashes with: # ValueError: width and height must each be below 32768 _figure.set_size_inches(11.2, 200) sys.stderr.write(Warning: draw_hist2d_plot(): Wanted to set %s figure height to %s but is too high, forcing %s instead. You will likely get an incomplete image.\n % (filename, 8.4 * _subfigs, 200)) if myoptions.debug 5: print Debug: draw_hist2d_plot(): Changed %s figure size to: %s % (filename, str(_figure.get_size_inches())) _ax1 = _figure.add_subplot(_ax1_num) _ax2 = _figure.add_subplot(_ax2_num) else: _figure.set_size_inches(11.2, 8.4 * 2) _ax1 = _figure.gca() if myoptions.debug 5: print Debug: draw_hist2d_plot(): Changed %s figure size to: %s % (filename, str(_figure.get_size_inches())) _series = [] #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): for _x, _y, _c in izip(mydata_x, mydata_y, colors): # _Line2D = _ax1.plot(_x, _y) # returns Line2D object _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object _series.append(_my_PathCollection) if legends: #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): for _x, _y, _c in izip(mydata_x, mydata_y, colors): _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) _series.append(_my_PathCollection) _ax2.legend(_series, legends, loc='upper left', bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', fontsize=legend_fontsize) _ax2.set_frame_on(False) _ax2.tick_params(bottom='off', left='off', right='off', top='off') pylab.setp(_ax2.get_yticklabels(), visible=False) pylab.setp(_ax2.get_xticklabels(), visible=False) else: for _x, _y, _c in izip(mydata_x, mydata_y, colors): _ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # keeps eating memory in: # # draw_hist2d_plot(filename, _data_xrow,
Re: [Matplotlib-users] Matplotlib eating memory
On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote: Benjamin Root wrote: On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com mailto:mmokr...@gmail.com wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Could there be places where gc.collect() could be introduced? Are there places where matplotlib could del() unnecessary objects right away? I think the problem is with huge lists or pythonic dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely a dict and that is the same issue. Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of dots, of course. Matplotlib generally keeps data in Numpy arrays, not lists or dictionaries (though given that matplotlib predates Numpy, there are some corner cases we've found recently where arrays are converted to lists and back unintentionally). As Ben said, the traceback looks quite normal -- and it doesn't show what any of the values are. If you can provide us with a script that reproduces this, that's the only way we can really plug in and see what might be going wrong. It doesn't have to have anything proprietary, such as your data. You can even start with one of the existing examples, if that helps. Mike _ |\/|o
Re: [Matplotlib-users] Matplotlib eating memory
Benjamin Root wrote: On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ mmokr...@gmail.com mailto:mmokr...@gmail.com wrote: Benjamin Root wrote: On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com mailto:mmokr...@gmail.com mailto:mmokr...@gmail.com mailto:mmokr...@gmail.com wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Could there be places where gc.collect() could be introduced? Are there places where matplotlib could del() unnecessary objects right away? I think the problem is with huge lists or pythonic dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely a dict and that is the same issue. Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of dots, of course. I am not going to claim that matplotlib is the most lean graphing library out there, and we already do know where we can make continued improvements, but the
Re: [Matplotlib-users] Matplotlib eating memory
Michael Droettboom wrote: On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote: Benjamin Root wrote: On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com mailto:mmokr...@gmail.com wrote: Hi, rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace of one such situation when it already took 15GB. Would somebody comments on what is matplotlib doing at the very moment? Why the recursion? The charts had to have 262422 data points in a 2D scatter plot, each point has assigned its own color. They are in batches so that there are 153 distinct colors but nevertheless, I assigned to each data point a color value. There are 153 legend items also (one color won't be used). ^CTraceback (most recent call last): ... _figure.savefig(filename, dpi=100) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, in savefig self.canvas.print_figure(*args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, in print_figure **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 505, in print_png FigureCanvasAgg.draw(self) File /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 451, in draw self.figure.draw(self.renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, in draw func(*args) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, in draw a.draw(renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in draw return Collection.draw(self, renderer) File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, in draw_wrapper draw(artist, renderer, *args, **kwargs) File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in draw offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in get_edgecolor return self._edgecolors KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs func(*targs, **kargs) File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, in destroy_all gc.collect() KeyboardInterrupt ^C Clues what is the code doing? I use mpl-1.3.0. Thank you, Martin Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. Without the accompanying code, it would be difficult to determine where the memory hog is. Could there be places where gc.collect() could be introduced? Are there places where matplotlib could del() unnecessary objects right away? I think the problem is with huge lists or pythonic dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely a dict and that is the same issue. Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of dots, of course. Matplotlib generally keeps data in Numpy arrays, not lists or dictionaries (though given that matplotlib predates Numpy, there are some corner cases we've found recently where arrays are converted to lists and back unintentionally). Just a brief note. I don't use Numpy myself in my code, so consider that while replicating my use case. ;) The code is merely what I think Tony Yu of Chao Yue proposed or somebody, sorry, don't remember now, proposed to me on this list in the past. I am writing it now really from top of my head, maybe I remember rubbish. ;) Martin
Re: [Matplotlib-users] Matplotlib eating memory
Thanks. This is much more helpful. What we need, however, is a self contained, standalone example. The code below calls functions that are not present. See http://sscce.org/ for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong. Mike On 10/10/2013 10:12 AM, Martin MOKREJŠ wrote: Michael Droettboom wrote: Can you provide a complete, standalone example that reproduces the problem. Otherwise all I can do is guess. The usual culprit is forgetting to close figures after you're done with them. Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning message some weeks ago. Yes, i do call _figure.clear() and pylab.clf() but only after the savefig() returns, which is not the case here. Also use gc.collect() a lot through the code, especially before and after I draw every figure. That is not enough here. from itertools import izip, imap, ifilter import pylab import matplotlib # Force matplotlib not to use any X-windows backend. matplotlib.use('Agg') import pylab F = pylab.gcf() # convert the view of numpy array to tuple # http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html DefaultSize = tuple(F.get_size_inches()) def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, xlabel_data, ylabel_data, legends, legend_loc='upper right', legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, tight_layout=False, legend_inside=False, objsize=0.1): # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None **kwargs) if len(mydata_x) != len(mydata_y): raise ValueError, %s: len(mydata_x) != len(mydata_y): %s != %s % (filename, len(mydata_x), len(mydata_y)) if colors and len(mydata_x) != len(colors): sys.stderr.write(Warning: draw_hist2d_plot(): %s: len(mydata_x) != len(colors): %s != %s.\n % (filename, len(mydata_x), len(colors))) if colors and legends and len(colors) != len(legends): sys.stderr.write(Warning: draw_hist2d_plot(): %s, len(colors) != len(legends): %s != %s.\n % (filename, len(colors), len(legends))) if mydata_x and mydata_y and filename: if legends: if not legend_ncol: _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, fontsize=legend_fontsize) else: _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, legend_ncol else: _subfigs, _ax1_num, _legend_ncol = 3, 313, 0 set_my_pylab_defaults() pylab.clf() _figure = pylab.figure() _figure.clear() _figure.set_tight_layout(True) gc.collect() if legends: # do not crash on too tall figures if 8.4 * _subfigs 200: _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1)) else: # _figure.set_size_inches() silently accepts a large value but later on _figure.savefig() crashes with: # ValueError: width and height must each be below 32768 _figure.set_size_inches(11.2, 200) sys.stderr.write(Warning: draw_hist2d_plot(): Wanted to set %s figure height to %s but is too high, forcing %s instead. You will likely get an incomplete image.\n % (filename, 8.4 * _subfigs, 200)) if myoptions.debug 5: print Debug: draw_hist2d_plot(): Changed %s figure size to: %s % (filename, str(_figure.get_size_inches())) _ax1 = _figure.add_subplot(_ax1_num) _ax2 = _figure.add_subplot(_ax2_num) else: _figure.set_size_inches(11.2, 8.4 * 2) _ax1 = _figure.gca() if myoptions.debug 5: print Debug: draw_hist2d_plot(): Changed %s figure size to: %s % (filename, str(_figure.get_size_inches())) _series = [] #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): for _x, _y, _c in izip(mydata_x, mydata_y, colors): # _Line2D = _ax1.plot(_x, _y) # returns Line2D object _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object _series.append(_my_PathCollection) if legends: #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends): for _x, _y, _c in izip(mydata_x, mydata_y, colors): _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) _series.append(_my_PathCollection) _ax2.legend(_series, legends, loc='upper left', bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand',
Re: [Matplotlib-users] Matplotlib eating memory
On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom md...@stsci.eduwrote: Thanks. This is much more helpful. What we need, however, is a self contained, standalone example. The code below calls functions that are not present. See http://sscce.org/for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong. Mike That being said, I do see a number of anti-patterns here that could be significant. For example: for _x, _y, _c in izip(mydata_x, mydata_y, colors): # _Line2D = _ax1.plot(_x, _y) # returns Line2D object _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object _series.append(_my_PathCollection) Could be more concisely written as: _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c in izip(mydata_x, mydata_y, colors)] Python can then more intelligently handle memory management by intelligently allocating the memory for _series. You can then use _series.extend() for when you are doing the scatter plots for _ax2 with a similar list comprehension (or even a generator statement). I would also question the need to store _series in the first place. You use it for the call to legend, but you could have simply passed a label to each call of scatter as well. Some other things of note: 1) The clear() call here is completely useless as the figure is already clear. _figure = pylab.figure() _figure.clear() 2) When limits are set on an axis, autoscaling for that axis is automatically turned off anyway, so no need to turn if off yourself (also not sure why you are calling out to an external function here): _ax1.set_autoscale_on(False) set_limits(_ax1, xmin, xmax, ymin, ymax) 3) Finally, some discussion on the end of your function here: if legends: _figure.savefig(filename, dpi=100) #, bbox_inches='tight') del(_my_PathCollection) del(_ax2) else: _figure.savefig(filename, dpi=100) del(_series) del(_ax1) _figure.clear() del(_figure) pylab.clf() pylab.close() first, as discussed, you can easily eliminate the need for _my_PathCollection and possibly even _series. Second, when calling _figure.clear(), all of its axes objects are deleted for you, so you don't need to delete them yourself. Third, you delete the _figure object, but then call pylab.clf(). I haven't double-checked exactly what would happen, but I think you might run the risk of accidentially clearing some other existing figure by doing that. Lastly, you then call pylab.close(), which I point out the same caveat as before. Really, all you needed was pylab.close() and you can eliminate the 5 preceding lines and the other two del()'s. All del() really does is remove the variable out of scope. Once that object is out of everybody's scope, then the gc can clean it up. Since the function was ending anyway, there is no point in deleting the variable. I don't know if this would fix your problem, and there are a bunch of other style issues here (particularly, pylab really shouldn't be used this way), but hopefully this gives some food for thought. Cheers! Ben Root -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk___ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Re: [Matplotlib-users] Matplotlib eating memory
Hi Ben, thank you for your comments. Looks I will have a bad sleep tonight. :( Some quick answers below. Benjamin Root wrote: On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom md...@stsci.edu mailto:md...@stsci.edu wrote: Thanks. This is much more helpful. What we need, however, is a self contained, standalone example. The code below calls functions that are not present. See http://sscce.org/ for why this is so important. Again, I would have to guess what those functions do -- it may be relevant, it may not. If I have something that I can *just run* then I can use various introspection tools to see what is going wrong. Mike That being said, I do see a number of anti-patterns here that could be significant. For example: for _x, _y, _c in izip(mydata_x, mydata_y, colors): # _Line2D = _ax1.plot(_x, _y) # returns Line2D object _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , label=_l) # returns PathCollection object _series.append(_my_PathCollection) Could be more concisely written as: _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c in izip(mydata_x, mydata_y, colors)] Python can then more intelligently handle memory management by intelligently allocating the memory for _series. You can then use _series.extend() for when you are doing the scatter plots for _ax2 with a similar list comprehension (or even a generator statement). You are right the .append() is ugly, maybe is a the real source of troubles. I somehow do not understand myself right now why under the if legends: use ax1 instead of ax2. Weird. I actually stopped using legends with this function because that was my first guess that they cause the memory issues. Seems the culprit is elsewhere so I should add them back and likely fix the ax2 vs. ax1 copy/paste (most likely) error. As you could have seen, I used in the past label=_l but for some reason I switched away to the current ugly code. Will try to find out why I did that. Hmm, I don't know what you mean with _series.extend() at the moment, will read some python Intro on using lists. :( I would also question the need to store _series in the first place. You use it for the call to legend, but you could have simply passed a label to each call of scatter as well. As I said, I used that in the past but somehow that did not work. Maybe time to re-try that. Some other things of note: 1) The clear() call here is completely useless as the figure is already clear. _figure = pylab.figure() _figure.clear() Right, I was just trying to ensure everything is cleared. I somewhat suspect python garbage collector does not recycle too often, and therefore added more and more del() and gc.collect() calls. 2) When limits are set on an axis, autoscaling for that axis is automatically turned off anyway, so no need to turn if off yourself (also not sure why you are calling out to an external function here): _ax1.set_autoscale_on(False) set_limits(_ax1, xmin, xmax, ymin, ymax) The set_limits() is called because I got unstable coordinates in every figure. Sometimes, matplotlib used wider offset from the axes line while sometimes not. So, I basically force same layout for expected layouts. 3) Finally, some discussion on the end of your function here: if legends: _figure.savefig(filename, dpi=100) #, bbox_inches='tight') del(_my_PathCollection) del(_ax2) else: _figure.savefig(filename, dpi=100) del(_series) del(_ax1) _figure.clear() del(_figure) pylab.clf() pylab.close() first, as discussed, you can easily eliminate the need for _my_PathCollection and possibly even _series. Second, when calling _figure.clear(), all of its axes objects are deleted for you, so you don't need to delete them yourself. Third, you delete the _figure object, but then call pylab.clf(). I haven't double-checked exactly what would happen, but I think you might run the risk of accidentially clearing some other existing figure by doing that. Lastly, you then call pylab.close(), which I point out the same caveat as before. Really, all you needed was pylab.close() and you can eliminate the 5 preceding lines and the other two del()'s. All del() really does is remove the variable out of scope. Once that object is out of everybody's scope, then the gc can clean it up. Since the function was ending anyway, there is no point in deleting the variable. Right, but I suspect that garbage collector does not recycle quickly enough unused objects after the function is left. If I generate many figure sin a loop, one after another, it appeared to me helpful to interleave the function calls with the gc.collect() calls. I don't know if this would fix your problem, and