Re: [Matplotlib-users] Matplotlib eating memory

2013-10-14 Thread Martin MOKREJŠ


Michael Droettboom wrote:
 Sorry to repeat myself, but please reduce this to a short, self contained 
 example, that is absolutely minimal to demonstrate the problem.  
 http://sscce.org/ should help better explain what I'm after.  I don't want to 
 find the needle in the haystack here -- there is code in your example that 
 doesn't even run, for example.
 
 That said, are you really after creating a legend entry for each of the dots? 
  (See below).  That just isn't going to work, and I'm not surprised it eats 
 up excessive amounts of memory.  I think you want (and can) reduce this to a 
 single scatter call.
 
 _series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for 
 _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns 
 PathCollection object

Are you sure? I think it was concluded on this list that scatter cannot (or 
does not) take
nested lists of lists with series like histogram and piechart do. I cannot find 
the thread
but maybe you are more lucky. I even think that I already opened a 
bugreport/feature requested
in the past for this. But maybe not.
Martin

 
 Mike
 
 On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote:
 Hi,
   so here is some quick but working example. I added there are 2-3 functions 
 (unused)
 as a bonus, you can easily call them from the main function using same API
 (except the piechart). I hope this shows what I lack in matplotlib - a 
 general API
 so that I could easily switch form scatter plot to piechart or barchart 
 without altering
 much the function arguments. Messing with return objects line2D, 
 PathCollection, Rectangle
 is awkward and I would like to stay away from matplotlib's internals. ;) 
 Some can be sliced,
 so not, you will see in the code.

   This eatmem.py will take easily all your memory. Drawing 30 dots is 
 not feasible
 with 16GB of RAM. While the example is for sure inefficient in many places 
 generating the data
 in python does not eat RAM. That happens afterwards.

 I would really like to hear whether matplotlib could be adjusted instead. ;) 
 I already mentioned
 in this thread that it is awkward to pre-create colors before passing all 
 data to a drawing
 function. I think we could all save a lot if matplotlib could dynamically 
 fetch colors
 on the fly from user-created generator, same for legends descriptions. I 
 think my example
 code shows the inefficient approach here. Would I have more time I would 
 randomize a bit
 more the sublist of each series so that the numbers in legends would be more 
 variable
 but that is a cosmetic issue.
   Probably due to my ignorance you will see that figures with legends have 
 different font
 sizes, axes are rescaled and the figure. Of course I wanted to have the 
 drawing same via both
 approaches but failed badly. The files/figures with legends should be just 
 accompanied by the
 legend table underneath but the drawing itself should be same. Maybe an 
 issue with DPI settings
 but not only.

   I placed some comments in the code, please don't take them in person. ;) 
 Of course
 I am glad for the existing work and am happy to contribute my crap. I am 
 fine if you rewamp
 this ugly code into matplotlib testsuite, provide similar function (the API 
 mentioned above)
 so that I could use your code directly. That would be great. I just tried to 
 show multiple
 issues at once, notably that is why I included those unused functions. You 
 will for sure find
 a way to use them.

  Regarding the unnecessary del() calls etc., I think I have to use keep 
 some, Ben, because
 the function is not always left soon enough. I could drop some, you are 
 right, but for some
 I don't think so. Matplotlib cannot recycle the memory until me (upstream) 
 deletes the reference
 so ... go and test this lousy code. Now you have a testcase. ;) Same with 
 the gc.collect() calls.
 Actually, the main loop with 10 iteration is there just to show why I always 
 want to clear
 a figure when entering a function and while leaving it as well. It happened 
 too many times that
 I drawed over an old figure, and this was posted also few times on this list 
 by others. That is
 a weird behavior in my opinion. We, users, are just forced to use too 
 low-level functions.

 So, have fun eating your memory! :))
 Martin
 
 
 -- 
_
 |\/|o _|_  _. _ | | \.__  __|__|_|_  _  _ ._ _  
 |  ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | 
 
 http://www.droettboom.com
 

-- 
Martin Mokrejs, Ph.D.
Bioinformatics
Donovalska 1658
149 00 Prague
Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-14 Thread Daniele Nicolodi
On 10/10/2013 15:05, Martin MOKREJŠ wrote:
 Hi,
   rendering some of my charts takes almost 50GB of RAM. I believe below is a 
 stracktrace
 of one such situation when it already took 15GB. Would somebody comments on 
 what is
 matplotlib doing at the very moment? Why the recursion?
 
   The charts had to have 262422 data points in a 2D scatter plot, each point 
 has assigned
 its own color. They are in batches so that there are 153 distinct colors but 
 nevertheless,
 I assigned to each data point a color value. There are 153 legend items also 
 (one color
 won't be used).

Hello Martin,

can I ask what is the meaning of plotting a scatter plot with 200
thousands points in it?  Either you visualize it on a screen much larger
than mine, or you are not going to be able to distinguish the single
data points. Maybe you should rethink the visualization tool you are using.

Nevertheless, I'm perfectly able to plot a scatter plot with 262422 data
points each with its own color just fine, and the python process
consumes a few hundred Mb of ram (having quite a few other datasets
loaded in memory)::

import numpy as np
import matplotlib.pyplot as plt
n = 262422
x = np.random.rand(n)
y = np.random.rand(n)
c = np.random.rand(n)
f = plt.figure()
a = f.add_subplot(111)
a.scatter(x, y, c=c, s=50)
plt.show()

and a possible solution using exactly 153 different colors, but again, I
don't see how you can distinguish between hundreds different shades of
colors::

 n = 262422 #22
ncolors = 153
x = np.random.rand(n)
y = np.random.rand(n)
c = np.random.rand(ncolors)
f = plt.figure()
a = f.add_subplot(111)
for i in xrange(n // ncolors):
a.scatter(x[i*ncolors:(i+1)*ncolors],
  y[i*ncolors:(i+1)*ncolors], c=c, s=50)
plt.show()

Unfortunately the code you provide is too contrived to be useful to
understand the root cause of your problem.

Cheers,
Daniele


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-12 Thread Martin MOKREJŠ
Hi,
  so here is some quick but working example. I added there are 2-3 functions 
(unused)
as a bonus, you can easily call them from the main function using same API
(except the piechart). I hope this shows what I lack in matplotlib - a general 
API
so that I could easily switch form scatter plot to piechart or barchart without 
altering
much the function arguments. Messing with return objects line2D, 
PathCollection, Rectangle
is awkward and I would like to stay away from matplotlib's internals. ;) Some 
can be sliced,
so not, you will see in the code.

  This eatmem.py will take easily all your memory. Drawing 30 dots is not 
feasible
with 16GB of RAM. While the example is for sure inefficient in many places 
generating the data
in python does not eat RAM. That happens afterwards.

I would really like to hear whether matplotlib could be adjusted instead. ;) I 
already mentioned
in this thread that it is awkward to pre-create colors before passing all data 
to a drawing
function. I think we could all save a lot if matplotlib could dynamically fetch 
colors
on the fly from user-created generator, same for legends descriptions. I think 
my example
code shows the inefficient approach here. Would I have more time I would 
randomize a bit
more the sublist of each series so that the numbers in legends would be more 
variable
but that is a cosmetic issue.
  Probably due to my ignorance you will see that figures with legends have 
different font
sizes, axes are rescaled and the figure. Of course I wanted to have the drawing 
same via both
approaches but failed badly. The files/figures with legends should be just 
accompanied by the
legend table underneath but the drawing itself should be same. Maybe an issue 
with DPI settings
but not only.

  I placed some comments in the code, please don't take them in person. ;) Of 
course
I am glad for the existing work and am happy to contribute my crap. I am fine 
if you rewamp
this ugly code into matplotlib testsuite, provide similar function (the API 
mentioned above)
so that I could use your code directly. That would be great. I just tried to 
show multiple
issues at once, notably that is why I included those unused functions. You will 
for sure find
a way to use them.

 Regarding the unnecessary del() calls etc., I think I have to use keep some, 
Ben, because
the function is not always left soon enough. I could drop some, you are right, 
but for some
I don't think so. Matplotlib cannot recycle the memory until me (upstream) 
deletes the reference
so ... go and test this lousy code. Now you have a testcase. ;) Same with the 
gc.collect() calls.
Actually, the main loop with 10 iteration is there just to show why I always 
want to clear
a figure when entering a function and while leaving it as well. It happened too 
many times that
I drawed over an old figure, and this was posted also few times on this list by 
others. That is
a weird behavior in my opinion. We, users, are just forced to use too low-level 
functions.

So, have fun eating your memory! :))
Martin
#! /usr/bin/env python

import sys
import gc
from textwrap import wrap
from itertools import izip, imap, ifilter, chain
import numpy as np
from math import ceil

import colorsys
import matplotlib
matplotlib.use('Agg')
# Force matplotlib not to use any X-windows backend.
import pylab
matplotlib.use('Agg')

from random import uniform, randint, randrange

from optparse import OptionParser
myversion = 'xxx'
myparser = OptionParser(version=%s version %s % ('%prog', myversion))

myparser.add_option(--series-num, action=store, type=int, dest=series_num, default=200,
help=Set number of series in the charts. Each series has its own color and legend text.)
myparser.add_option(--max-datapoints-per-series, action=store, type=int, dest=max_datapoints_per_series, default=2000,
help=Set number of data points to be generated at random. The actual counts will appear in the legend.)

(myoptions, myargs) = myparser.parse_args()


# convert the view of numpy array to tuple
# http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html


def generate_color_tuples(cnt, start, stop):
_HSV_tuples1 = []
for _n in xrange(1,cnt + 1):
_h1 = sorted([uniform(start, stop) for x in xrange(_n)])
_HSV_tuples1 = [(_h1[x], 1.0, 1.0) for x in xrange(_n)]

return [colorsys.hsv_to_rgb(*x) for x in _HSV_tuples1]


def generate_color_tuples_wrapper(_wanted_length):
Generating lots of colors is useless. Try to make a color list using batches of
colors. Make 100 different colors and then re-use them to get the final number.


if _wanted_length  300:
_manageable_length = ( _wanted_length / 100 ) + 1 # round up
_short_colors = generate_color_tuples(100, 0.01, 0.95) # 0.01, 0.95)
_colors = []

# this way we rotate the color batches several times
#for _i in xrange(1, 

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Michael Droettboom
Can you provide a complete, standalone example that reproduces the 
problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with 
them.

Mike

On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote:
 Hi,
rendering some of my charts takes almost 50GB of RAM. I believe below is a 
 stracktrace
 of one such situation when it already took 15GB. Would somebody comments on 
 what is
 matplotlib doing at the very moment? Why the recursion?

The charts had to have 262422 data points in a 2D scatter plot, each point 
 has assigned
 its own color. They are in batches so that there are 153 distinct colors but 
 nevertheless,
 I assigned to each data point a color value. There are 153 legend items also 
 (one color
 won't be used).

 ^CTraceback (most recent call last):
 ...
  _figure.savefig(filename, dpi=100)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1421, 
 in savefig
  self.canvas.print_figure(*args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, 
 line 2220, in print_figure
  **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 
 505, in print_png
  FigureCanvasAgg.draw(self)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 
 451, in draw
  self.figure.draw(self.renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, 
 in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 1034, 
 in draw
  func(*args)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, 
 in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086, 
 in draw
  a.draw(renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, 
 in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 
 718, in draw
  return Collection.draw(self, renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54, 
 in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 
 276, in draw
  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 
 551, in get_edgecolor
  return self._edgecolors
 KeyboardInterrupt
 ^CError in atexit._run_exitfuncs:
 Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, 
 line 90, in destroy_all
  gc.collect()
 KeyboardInterrupt
 Error in sys.exitfunc:
 Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, 
 line 90, in destroy_all
  gc.collect()
 KeyboardInterrupt

 ^C


 Clues what is the code doing? I use mpl-1.3.0.
 Thank you,
 Martin

 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk
 ___
 Matplotlib-users mailing list
 Matplotlib-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/matplotlib-users


-- 
_
|\/|o _|_  _. _ | | \.__  __|__|_|_  _  _ ._ _
|  ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Benjamin Root
On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com wrote:

 Hi,
   rendering some of my charts takes almost 50GB of RAM. I believe below is
 a stracktrace
 of one such situation when it already took 15GB. Would somebody comments
 on what is
 matplotlib doing at the very moment? Why the recursion?

   The charts had to have 262422 data points in a 2D scatter plot, each
 point has assigned
 its own color. They are in batches so that there are 153 distinct colors
 but nevertheless,
 I assigned to each data point a color value. There are 153 legend items
 also (one color
 won't be used).

 ^CTraceback (most recent call last):
 ...
 _figure.savefig(filename, dpi=100)
   File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line
 1421, in savefig
 self.canvas.print_figure(*args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py,
 line 2220, in print_figure
 **kwargs)
   File
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py,
 line 505, in print_png
 FigureCanvasAgg.draw(self)
   File
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py,
 line 451, in draw
 self.figure.draw(self.renderer)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54,
 in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line
 1034, in draw
 func(*args)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54,
 in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 2086,
 in draw
 a.draw(renderer)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54,
 in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/collections.py,
 line 718, in draw
 return Collection.draw(self, renderer)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 54,
 in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/collections.py,
 line 276, in draw
 offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
   File /usr/lib64/python2.7/site-packages/matplotlib/collections.py,
 line 551, in get_edgecolor
 return self._edgecolors
 KeyboardInterrupt
 ^CError in atexit._run_exitfuncs:
 Traceback (most recent call last):
   File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
 func(*targs, **kargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py,
 line 90, in destroy_all
 gc.collect()
 KeyboardInterrupt
 Error in sys.exitfunc:
 Traceback (most recent call last):
   File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
 func(*targs, **kargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py,
 line 90, in destroy_all
 gc.collect()
 KeyboardInterrupt

 ^C


 Clues what is the code doing? I use mpl-1.3.0.
 Thank you,
 Martin


Unfortunately, that stacktrace isn't very useful. There is no recursion
there, but rather the perfectly normal drawing of the figure object that
has a child axes, which has child collections which have child artist
objects.

Without the accompanying code, it would be difficult to determine where the
memory hog is.

Ben Root
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ
Benjamin Root wrote:
 
 
 
 On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com 
 mailto:mmokr...@gmail.com wrote:
 
 Hi,
   rendering some of my charts takes almost 50GB of RAM. I believe below 
 is a stracktrace
 of one such situation when it already took 15GB. Would somebody comments 
 on what is
 matplotlib doing at the very moment? Why the recursion?
 
   The charts had to have 262422 data points in a 2D scatter plot, each 
 point has assigned
 its own color. They are in batches so that there are 153 distinct colors 
 but nevertheless,
 I assigned to each data point a color value. There are 153 legend items 
 also (one color
 won't be used).
 
 ^CTraceback (most recent call last):
 ...
 _figure.savefig(filename, dpi=100)
   File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 
 1421, in savefig
 self.canvas.print_figure(*args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, 
 line 2220, in print_figure
 **kwargs)
   File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 
 505, in print_png
 FigureCanvasAgg.draw(self)
   File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 
 451, in draw
 self.figure.draw(self.renderer)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 
 1034, in draw
 func(*args)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 
 2086, in draw
 a.draw(renderer)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 718, in draw
 return Collection.draw(self, renderer)
   File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
 draw(artist, renderer, *args, **kwargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 276, in draw
 offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
   File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 551, in get_edgecolor
 return self._edgecolors
 KeyboardInterrupt
 ^CError in atexit._run_exitfuncs:
 Traceback (most recent call last):
   File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
 func(*targs, **kargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, 
 line 90, in destroy_all
 gc.collect()
 KeyboardInterrupt
 Error in sys.exitfunc:
 Traceback (most recent call last):
   File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
 func(*targs, **kargs)
   File /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, 
 line 90, in destroy_all
 gc.collect()
 KeyboardInterrupt
 
 ^C
 
 
 Clues what is the code doing? I use mpl-1.3.0.
 Thank you,
 Martin
 
 
 Unfortunately, that stacktrace isn't very useful. There is no recursion 
 there, but rather the perfectly normal drawing of the figure object that has 
 a child axes, which has child collections which have child artist objects.
 
 Without the accompanying code, it would be difficult to determine where the 
 memory hog is.

Could there be places where gc.collect() could be introduced? Are there places 
where matplotlib
could del() unnecessary objects right away? I think the problem is with huge 
lists or pythonic
dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 
file having just
10MB on disk. I speculate matplotlib in that code keeps the data in some huge 
list or more likely
a dict and that is the same issue.

Are you sure you cannot see where a problem is? It happens (is visible) only 
with huge number of
dots, of course.

Thanks,
Martin

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Benjamin Root
On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ mmokr...@gmail.com wrote:

 Benjamin Root wrote:
 
 
 
  On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.commailto:
 mmokr...@gmail.com wrote:
 
  Hi,
rendering some of my charts takes almost 50GB of RAM. I believe
 below is a stracktrace
  of one such situation when it already took 15GB. Would somebody
 comments on what is
  matplotlib doing at the very moment? Why the recursion?
 
The charts had to have 262422 data points in a 2D scatter plot,
 each point has assigned
  its own color. They are in batches so that there are 153 distinct
 colors but nevertheless,
  I assigned to each data point a color value. There are 153 legend
 items also (one color
  won't be used).
 
  ^CTraceback (most recent call last):
  ...
  _figure.savefig(filename, dpi=100)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py,
 line 1421, in savefig
  self.canvas.print_figure(*args, **kwargs)
File
 /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line
 2220, in print_figure
  **kwargs)
File
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py,
 line 505, in print_png
  FigureCanvasAgg.draw(self)
File
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py,
 line 451, in draw
  self.figure.draw(self.renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py,
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py,
 line 1034, in draw
  func(*args)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py,
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line
 2086, in draw
  a.draw(renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py,
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File
 /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718,
 in draw
  return Collection.draw(self, renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py,
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File
 /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276,
 in draw
  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
File
 /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551,
 in get_edgecolor
  return self._edgecolors
  KeyboardInterrupt
  ^CError in atexit._run_exitfuncs:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90,
 in destroy_all
  gc.collect()
  KeyboardInterrupt
  Error in sys.exitfunc:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90,
 in destroy_all
  gc.collect()
  KeyboardInterrupt
 
  ^C
 
 
  Clues what is the code doing? I use mpl-1.3.0.
  Thank you,
  Martin
 
 
  Unfortunately, that stacktrace isn't very useful. There is no recursion
 there, but rather the perfectly normal drawing of the figure object that
 has a child axes, which has child collections which have child artist
 objects.
 
  Without the accompanying code, it would be difficult to determine where
 the memory hog is.

 Could there be places where gc.collect() could be introduced? Are there
 places where matplotlib
 could del() unnecessary objects right away? I think the problem is with
 huge lists or pythonic
 dicts. I could save 10GB of RAM when I converted one python dict to a
 bsddb3 file having just
 10MB on disk. I speculate matplotlib in that code keeps the data in some
 huge list or more likely
 a dict and that is the same issue.

 Are you sure you cannot see where a problem is? It happens (is visible)
 only with huge number of
 dots, of course.


I am not going to claim that matplotlib is the most lean graphing library
out there, and we already do know where we can make continued improvements,
but the symptom you are describing (50 GB for a couple hundred thousand
scatter points) is just unheard of for matplotlib. Without a simple,
concise, complete code example to demonstrate your problem, we can only
hazard guesses. For all I know, you might be appending to numpy arrays in
a loop prior to plotting, which would eat up significant amount of memory
without it being the fault of matplotlib.

As far as I am aware, we don't do very large dictionaries, so I am doubtful

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ
Michael Droettboom wrote:
 Can you provide a complete, standalone example that reproduces the 
 problem. Otherwise all I can do is guess.
 
 The usual culprit is forgetting to close figures after you're done with 
 them.

Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning 
message some weeks
ago. Yes, i do call _figure.clear() and pylab.clf()  but only after the 
savefig() returns, which
is not the case here. Also use gc.collect() a lot through the code, especially 
before and after
I draw every figure. That is not enough here.





from itertools import izip, imap, ifilter
import pylab
import matplotlib
# Force matplotlib not to use any X-windows backend.
matplotlib.use('Agg')
import pylab

F = pylab.gcf()

# convert the view of numpy array to tuple
# 
http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html
DefaultSize = tuple(F.get_size_inches())



def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, 
xlabel_data, ylabel_data, legends, legend_loc='upper right', 
legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, 
ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, 
tight_layout=False, legend_inside=False, objsize=0.1):
# hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None 
**kwargs)

if len(mydata_x) != len(mydata_y):
raise ValueError, %s: len(mydata_x) != len(mydata_y): %s != %s % 
(filename, len(mydata_x), len(mydata_y))

if colors and len(mydata_x) != len(colors):
sys.stderr.write(Warning: draw_hist2d_plot(): %s: len(mydata_x) != 
len(colors): %s != %s.\n % (filename, len(mydata_x), len(colors)))

if colors and legends and len(colors) != len(legends):
sys.stderr.write(Warning: draw_hist2d_plot(): %s, len(colors) != 
len(legends): %s != %s.\n % (filename, len(colors), len(legends)))

if mydata_x and mydata_y and filename:
if legends:
if not legend_ncol:
_subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, 
fontsize=legend_fontsize)
else:
_subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, 
legend_ncol
else:
_subfigs, _ax1_num, _legend_ncol = 3, 313, 0

set_my_pylab_defaults()
pylab.clf()
_figure = pylab.figure()
_figure.clear()
_figure.set_tight_layout(True)
gc.collect()

if legends:
# do not crash on too tall figures
if 8.4 * _subfigs  200:
_figure.set_size_inches(11.2, 8.4 * (_subfigs + 1))
else:
# _figure.set_size_inches() silently accepts a large value but 
later on _figure.savefig() crashes with:
# ValueError: width and height must each be below 32768
_figure.set_size_inches(11.2, 200)
sys.stderr.write(Warning: draw_hist2d_plot(): Wanted to set %s 
figure height to %s but is too high, forcing %s instead. You will likely get an 
incomplete image.\n % (filename, 8.4 * _subfigs, 200))
if myoptions.debug  5: print Debug: draw_hist2d_plot(): Changed 
%s figure size to: %s % (filename, str(_figure.get_size_inches()))
_ax1 = _figure.add_subplot(_ax1_num)
_ax2 = _figure.add_subplot(_ax2_num)
else:
_figure.set_size_inches(11.2, 8.4 * 2)
_ax1 = _figure.gca()
if myoptions.debug  5: print Debug: draw_hist2d_plot(): Changed %s 
figure size to: %s % (filename, str(_figure.get_size_inches()))

_series = []
#for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
for _x, _y, _c in izip(mydata_x, mydata_y, colors):
# _Line2D = _ax1.plot(_x, _y) # returns Line2D object
_my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , 
label=_l) # returns PathCollection object
_series.append(_my_PathCollection)

if legends:
#for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
for _x, _y, _c in izip(mydata_x, mydata_y, colors):
_my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) 
# , label=_l)
_series.append(_my_PathCollection)

_ax2.legend(_series, legends, loc='upper left', 
bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', 
fontsize=legend_fontsize)
_ax2.set_frame_on(False)
_ax2.tick_params(bottom='off', left='off', right='off', top='off')
pylab.setp(_ax2.get_yticklabels(), visible=False)
pylab.setp(_ax2.get_xticklabels(), visible=False)
else:
for _x, _y, _c in izip(mydata_x, mydata_y, colors):
_ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # 
keeps eating memory in:
#
# draw_hist2d_plot(filename, _data_xrow, 

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Michael Droettboom
On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote:
 Benjamin Root wrote:


 On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com 
 mailto:mmokr...@gmail.com wrote:

  Hi,
rendering some of my charts takes almost 50GB of RAM. I believe below 
 is a stracktrace
  of one such situation when it already took 15GB. Would somebody 
 comments on what is
  matplotlib doing at the very moment? Why the recursion?

The charts had to have 262422 data points in a 2D scatter plot, each 
 point has assigned
  its own color. They are in batches so that there are 153 distinct 
 colors but nevertheless,
  I assigned to each data point a color value. There are 153 legend items 
 also (one color
  won't be used).

  ^CTraceback (most recent call last):
  ...
  _figure.savefig(filename, dpi=100)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 
 1421, in savefig
  self.canvas.print_figure(*args, **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, 
 in print_figure
  **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, 
 line 505, in print_png
  FigureCanvasAgg.draw(self)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, 
 line 451, in draw
  self.figure.draw(self.renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 
 1034, in draw
  func(*args)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 
 2086, in draw
  a.draw(renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 718, in draw
  return Collection.draw(self, renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 276, in draw
  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 551, in get_edgecolor
  return self._edgecolors
  KeyboardInterrupt
  ^CError in atexit._run_exitfuncs:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, 
 in destroy_all
  gc.collect()
  KeyboardInterrupt
  Error in sys.exitfunc:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, 
 in destroy_all
  gc.collect()
  KeyboardInterrupt

  ^C


  Clues what is the code doing? I use mpl-1.3.0.
  Thank you,
  Martin


 Unfortunately, that stacktrace isn't very useful. There is no recursion 
 there, but rather the perfectly normal drawing of the figure object that has 
 a child axes, which has child collections which have child artist objects.

 Without the accompanying code, it would be difficult to determine where the 
 memory hog is.
 Could there be places where gc.collect() could be introduced? Are there 
 places where matplotlib
 could del() unnecessary objects right away? I think the problem is with huge 
 lists or pythonic
 dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 
 file having just
 10MB on disk. I speculate matplotlib in that code keeps the data in some huge 
 list or more likely
 a dict and that is the same issue.

 Are you sure you cannot see where a problem is? It happens (is visible) only 
 with huge number of
 dots, of course.

Matplotlib generally keeps data in Numpy arrays, not lists or 
dictionaries (though given that matplotlib predates Numpy, there are 
some corner cases we've found recently where arrays are converted to 
lists and back unintentionally).

As Ben said, the traceback looks quite normal -- and it doesn't show 
what any of the values are.  If you can provide us with a script that 
reproduces this, that's the only way we can really plug in and see what 
might be going wrong.  It doesn't have to have anything proprietary, 
such as your data.  You can even start with one of the existing 
examples, if that helps.

Mike

 _
 |\/|o 

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ


Benjamin Root wrote:
 
 On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ mmokr...@gmail.com 
 mailto:mmokr...@gmail.com wrote:
 
 Benjamin Root wrote:
 
 
 
  On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com 
 mailto:mmokr...@gmail.com mailto:mmokr...@gmail.com 
 mailto:mmokr...@gmail.com wrote:
 
  Hi,
rendering some of my charts takes almost 50GB of RAM. I believe 
 below is a stracktrace
  of one such situation when it already took 15GB. Would somebody 
 comments on what is
  matplotlib doing at the very moment? Why the recursion?
 
The charts had to have 262422 data points in a 2D scatter plot, 
 each point has assigned
  its own color. They are in batches so that there are 153 distinct 
 colors but nevertheless,
  I assigned to each data point a color value. There are 153 legend 
 items also (one color
  won't be used).
 
  ^CTraceback (most recent call last):
  ...
  _figure.savefig(filename, dpi=100)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, 
 line 1421, in savefig
  self.canvas.print_figure(*args, **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 2220, 
 in print_figure
  **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 
 505, in print_png
  FigureCanvasAgg.draw(self)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, line 
 451, in draw
  self.figure.draw(self.renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, 
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, 
 line 1034, in draw
  func(*args)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, 
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, 
 line 2086, in draw
  a.draw(renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, 
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 718, in 
 draw
  return Collection.draw(self, renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, 
 line 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 276, in 
 draw
  offsets, transOffset, self.get_facecolor(), 
 self.get_edgecolor(),
File 
 /usr/lib64/python2.7/site-packages/matplotlib/collections.py, line 551, in 
 get_edgecolor
  return self._edgecolors
  KeyboardInterrupt
  ^CError in atexit._run_exitfuncs:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, 
 in destroy_all
  gc.collect()
  KeyboardInterrupt
  Error in sys.exitfunc:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, 
 in destroy_all
  gc.collect()
  KeyboardInterrupt
 
  ^C
 
 
  Clues what is the code doing? I use mpl-1.3.0.
  Thank you,
  Martin
 
 
  Unfortunately, that stacktrace isn't very useful. There is no recursion 
 there, but rather the perfectly normal drawing of the figure object that has 
 a child axes, which has child collections which have child artist objects.
 
  Without the accompanying code, it would be difficult to determine where 
 the memory hog is.
 
 Could there be places where gc.collect() could be introduced? Are there 
 places where matplotlib
 could del() unnecessary objects right away? I think the problem is with 
 huge lists or pythonic
 dicts. I could save 10GB of RAM when I converted one python dict to a 
 bsddb3 file having just
 10MB on disk. I speculate matplotlib in that code keeps the data in some 
 huge list or more likely
 a dict and that is the same issue.
 
 Are you sure you cannot see where a problem is? It happens (is visible) 
 only with huge number of
 dots, of course.
 
 
 I am not going to claim that matplotlib is the most lean graphing library out 
 there, and we already do know where we can make continued improvements, but 
 the 

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ


Michael Droettboom wrote:
 On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote:
 Benjamin Root wrote:


 On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ mmokr...@gmail.com 
 mailto:mmokr...@gmail.com wrote:

  Hi,
rendering some of my charts takes almost 50GB of RAM. I believe 
 below is a stracktrace
  of one such situation when it already took 15GB. Would somebody 
 comments on what is
  matplotlib doing at the very moment? Why the recursion?

The charts had to have 262422 data points in a 2D scatter plot, each 
 point has assigned
  its own color. They are in batches so that there are 153 distinct 
 colors but nevertheless,
  I assigned to each data point a color value. There are 153 legend 
 items also (one color
  won't be used).

  ^CTraceback (most recent call last):
  ...
  _figure.savefig(filename, dpi=100)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 
 1421, in savefig
  self.canvas.print_figure(*args, **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py, line 
 2220, in print_figure
  **kwargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, 
 line 505, in print_png
  FigureCanvasAgg.draw(self)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py, 
 line 451, in draw
  self.figure.draw(self.renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/figure.py, line 
 1034, in draw
  func(*args)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/axes.py, line 
 2086, in draw
  a.draw(renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 718, in draw
  return Collection.draw(self, renderer)
File /usr/lib64/python2.7/site-packages/matplotlib/artist.py, line 
 54, in draw_wrapper
  draw(artist, renderer, *args, **kwargs)
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 276, in draw
  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
File /usr/lib64/python2.7/site-packages/matplotlib/collections.py, 
 line 551, in get_edgecolor
  return self._edgecolors
  KeyboardInterrupt
  ^CError in atexit._run_exitfuncs:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, 
 in destroy_all
  gc.collect()
  KeyboardInterrupt
  Error in sys.exitfunc:
  Traceback (most recent call last):
File /usr/lib64/python2.7/atexit.py, line 24, in _run_exitfuncs
  func(*targs, **kargs)
File 
 /usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py, line 90, 
 in destroy_all
  gc.collect()
  KeyboardInterrupt

  ^C


  Clues what is the code doing? I use mpl-1.3.0.
  Thank you,
  Martin


 Unfortunately, that stacktrace isn't very useful. There is no recursion 
 there, but rather the perfectly normal drawing of the figure object that 
 has a child axes, which has child collections which have child artist 
 objects.

 Without the accompanying code, it would be difficult to determine where the 
 memory hog is.
 Could there be places where gc.collect() could be introduced? Are there 
 places where matplotlib
 could del() unnecessary objects right away? I think the problem is with huge 
 lists or pythonic
 dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 
 file having just
 10MB on disk. I speculate matplotlib in that code keeps the data in some 
 huge list or more likely
 a dict and that is the same issue.

 Are you sure you cannot see where a problem is? It happens (is visible) only 
 with huge number of
 dots, of course.
 
 Matplotlib generally keeps data in Numpy arrays, not lists or 
 dictionaries (though given that matplotlib predates Numpy, there are 
 some corner cases we've found recently where arrays are converted to 
 lists and back unintentionally).

Just a brief note. I don't use Numpy myself in my code, so consider that
while replicating my use case. ;) The code is merely what I think Tony Yu 
of Chao Yue proposed or somebody, sorry, don't remember now, proposed to
me on this list in the past. I am writing it now really from top of my head,
maybe I remember rubbish. ;)

Martin


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Michael Droettboom

Thanks.  This is much more helpful.

What we need, however, is a self contained, standalone example. The 
code below calls functions that are not present.  See http://sscce.org/ 
for why this is so important.  Again, I would have to guess what those 
functions do -- it may be relevant, it may not.  If I have something 
that I can *just run* then I can use various introspection tools to see 
what is going wrong.


Mike

On 10/10/2013 10:12 AM, Martin MOKREJŠ wrote:

Michael Droettboom wrote:

Can you provide a complete, standalone example that reproduces the
problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with
them.

Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning 
message some weeks
ago. Yes, i do call _figure.clear() and pylab.clf()  but only after the 
savefig() returns, which
is not the case here. Also use gc.collect() a lot through the code, especially 
before and after
I draw every figure. That is not enough here.





from itertools import izip, imap, ifilter
import pylab
import matplotlib
# Force matplotlib not to use any X-windows backend.
matplotlib.use('Agg')
import pylab

F = pylab.gcf()

# convert the view of numpy array to tuple
# 
http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html
DefaultSize = tuple(F.get_size_inches())



def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, 
xlabel_data, ylabel_data, legends, legend_loc='upper right', 
legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, 
ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, 
tight_layout=False, legend_inside=False, objsize=0.1):
 # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None 
**kwargs)

 if len(mydata_x) != len(mydata_y):
 raise ValueError, %s: len(mydata_x) != len(mydata_y): %s != %s % 
(filename, len(mydata_x), len(mydata_y))

 if colors and len(mydata_x) != len(colors):
 sys.stderr.write(Warning: draw_hist2d_plot(): %s: len(mydata_x) != 
len(colors): %s != %s.\n % (filename, len(mydata_x), len(colors)))

 if colors and legends and len(colors) != len(legends):
 sys.stderr.write(Warning: draw_hist2d_plot(): %s, len(colors) != 
len(legends): %s != %s.\n % (filename, len(colors), len(legends)))

 if mydata_x and mydata_y and filename:
 if legends:
 if not legend_ncol:
 _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, 
fontsize=legend_fontsize)
 else:
 _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, 
legend_ncol
 else:
 _subfigs, _ax1_num, _legend_ncol = 3, 313, 0

 set_my_pylab_defaults()
 pylab.clf()
 _figure = pylab.figure()
 _figure.clear()
 _figure.set_tight_layout(True)
 gc.collect()

 if legends:
 # do not crash on too tall figures
 if 8.4 * _subfigs  200:
 _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1))
 else:
 # _figure.set_size_inches() silently accepts a large value but 
later on _figure.savefig() crashes with:
 # ValueError: width and height must each be below 32768
 _figure.set_size_inches(11.2, 200)
 sys.stderr.write(Warning: draw_hist2d_plot(): Wanted to set %s 
figure height to %s but is too high, forcing %s instead. You will likely get an 
incomplete image.\n % (filename, 8.4 * _subfigs, 200))
 if myoptions.debug  5: print Debug: draw_hist2d_plot(): Changed %s 
figure size to: %s % (filename, str(_figure.get_size_inches()))
 _ax1 = _figure.add_subplot(_ax1_num)
 _ax2 = _figure.add_subplot(_ax2_num)
 else:
 _figure.set_size_inches(11.2, 8.4 * 2)
 _ax1 = _figure.gca()
 if myoptions.debug  5: print Debug: draw_hist2d_plot(): Changed %s figure 
size to: %s % (filename, str(_figure.get_size_inches()))

 _series = []
 #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
 for _x, _y, _c in izip(mydata_x, mydata_y, colors):
 # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
 _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , 
label=_l) # returns PathCollection object
 _series.append(_my_PathCollection)

 if legends:
 #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
 for _x, _y, _c in izip(mydata_x, mydata_y, colors):
 _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) 
# , label=_l)
 _series.append(_my_PathCollection)

 _ax2.legend(_series, legends, loc='upper left', 
bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', 

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Benjamin Root
On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom md...@stsci.eduwrote:

  Thanks.  This is much more helpful.

 What we need, however, is a self contained, standalone example.  The
 code below calls functions that are not present.  See http://sscce.org/for 
 why this is so important.  Again, I would have to guess what those
 functions do -- it may be relevant, it may not.  If I have something that I
 can *just run* then I can use various introspection tools to see what is
 going wrong.

 Mike


That being said, I do see a number of anti-patterns here that could be
significant. For example:

for _x, _y, _c in izip(mydata_x, mydata_y, colors):
# _Line2D = _ax1.plot(_x, _y) # returns Line2D object
_my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize)
# , label=_l) # returns PathCollection object
_series.append(_my_PathCollection)

Could be more concisely written as:

_series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c
in izip(mydata_x, mydata_y, colors)]

Python can then more intelligently handle memory management by
intelligently allocating the memory for _series. You can then use
_series.extend() for when you are doing the scatter plots for _ax2 with a
similar list comprehension (or even a generator statement).

I would also question the need to store _series in the first place. You use
it for the call to legend, but you could have simply passed a label to each
call of scatter as well.

Some other things of note:

1) The clear() call here is completely useless as the figure is already
clear.
_figure = pylab.figure()
_figure.clear()

2) When limits are set on an axis, autoscaling for that axis is
automatically turned off anyway, so no need to turn if off yourself (also
not sure why you are calling out to an external function here):
_ax1.set_autoscale_on(False)
set_limits(_ax1, xmin, xmax, ymin, ymax)

3) Finally, some discussion on the end of your function here:
if legends:
_figure.savefig(filename, dpi=100) #, bbox_inches='tight')
del(_my_PathCollection)
del(_ax2)
else:
_figure.savefig(filename, dpi=100)

del(_series)
del(_ax1)
_figure.clear()
del(_figure)
pylab.clf()
pylab.close()
first, as discussed, you can easily eliminate the need for
_my_PathCollection and possibly even _series. Second, when calling
_figure.clear(), all of its axes objects are deleted for you, so you don't
need to delete them yourself. Third, you delete the _figure object, but
then call pylab.clf(). I haven't double-checked exactly what would
happen, but I think you might run the risk of accidentially clearing some
other existing figure by doing that. Lastly, you then call pylab.close(),
which I point out the same caveat as before. Really, all you needed was
pylab.close() and you can eliminate the 5 preceding lines and the other two
del()'s. All del() really does is remove the variable out of scope. Once
that object is out of everybody's scope, then the gc can clean it up. Since
the function was ending anyway, there is no point in deleting the variable.

I don't know if this would fix your problem, and there are a bunch of other
style issues here (particularly, pylab really shouldn't be used this way),
but hopefully this gives some food for thought.

Cheers!
Ben Root
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ
Hi Ben,
  thank you for your comments. Looks I will have a bad sleep tonight. :( Some 
quick
answers below.

Benjamin Root wrote:
 
 
 
 On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom md...@stsci.edu 
 mailto:md...@stsci.edu wrote:
 
 Thanks.  This is much more helpful.
 
 What we need, however, is a self contained, standalone example.  The 
 code below calls functions that are not present.  See http://sscce.org/ for 
 why this is so important.  Again, I would have to guess what those functions 
 do -- it may be relevant, it may not.  If I have something that I can *just 
 run* then I can use various introspection tools to see what is going wrong.
 
 Mike
 
 
 That being said, I do see a number of anti-patterns here that could be 
 significant. For example:
 
 for _x, _y, _c in izip(mydata_x, mydata_y, colors):
 # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
 _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # 
 , label=_l) # returns PathCollection object
 _series.append(_my_PathCollection)
 
 Could be more concisely written as:
 
 _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c 
 in izip(mydata_x, mydata_y, colors)]
 
 Python can then more intelligently handle memory management by intelligently 
 allocating the memory for _series. You can then use _series.extend() for when 
 you are doing the scatter plots for _ax2 with a similar list comprehension 
 (or even a generator statement).

You are right the .append() is ugly, maybe is a the real source of troubles. I 
somehow
do not understand myself right now why under the if legends: use ax1 instead 
of ax2.
Weird. I actually stopped using legends with this function because that was my 
first guess
that they cause the memory issues. Seems the culprit is elsewhere so I should 
add them
back and likely fix the ax2 vs. ax1 copy/paste (most likely) error.

As you could have seen, I used in the past label=_l but for some reason I 
switched away
to the current ugly code. Will try to find out why I did that.

Hmm, I don't know what you mean with _series.extend() at the moment, will read 
some
python Intro on using lists. :(


 
 I would also question the need to store _series in the first place. You use 
 it for the call to legend, but you could have simply passed a label to each 
 call of scatter as well.

As I said, I used that in the past but somehow that did not work. Maybe time to 
re-try that.

 
 Some other things of note:
 
 1) The clear() call here is completely useless as the figure is already clear.
 _figure = pylab.figure()
 _figure.clear()

Right, I was just trying to ensure everything is cleared. I somewhat suspect 
python
garbage collector does not recycle too often, and therefore added more and more 
del()
and gc.collect() calls.

 
 2) When limits are set on an axis, autoscaling for that axis is automatically 
 turned off anyway, so no need to turn if off yourself (also not sure why you 
 are calling out to an external function here):
 _ax1.set_autoscale_on(False)
 set_limits(_ax1, xmin, xmax, ymin, ymax)

The set_limits() is called because I got unstable coordinates in every figure.
Sometimes, matplotlib used wider offset from the axes line while sometimes not.
So, I basically force same layout for expected layouts.

 
 3) Finally, some discussion on the end of your function here:
 if legends:
 _figure.savefig(filename, dpi=100) #, bbox_inches='tight')
 del(_my_PathCollection)
 del(_ax2)
 else:
 _figure.savefig(filename, dpi=100)
 
 del(_series)
 del(_ax1)
 _figure.clear()
 del(_figure)
 pylab.clf()
 pylab.close()
 first, as discussed, you can easily eliminate the need for _my_PathCollection 
 and possibly even _series. Second, when calling _figure.clear(), all of its 
 axes objects are deleted for you, so you don't need to delete them yourself. 
 Third, you delete the _figure object, but then call pylab.clf(). I haven't 
 double-checked exactly what would happen, but I think you might run the risk 
 of accidentially clearing some other existing figure by doing that. Lastly, 
 you then call pylab.close(), which I point out the same caveat as before. 
 Really, all you needed was pylab.close() and you can eliminate the 5 
 preceding lines and the other two del()'s. All del() really does is remove 
 the variable out of scope. Once that object is out of everybody's scope, then 
 the gc can clean it up. Since the function was ending anyway, there is no 
 point in deleting the variable.

Right, but I suspect that garbage collector  does not recycle quickly enough 
unused objects
after the function is left. If I generate many figure sin a loop, one after 
another, it
appeared to me helpful to interleave the function calls with the gc.collect() 
calls.

 
 I don't know if this would fix your problem, and