Re: [Matplotlib-users] Matplotlib eating memory

2013-10-14 Thread Daniele Nicolodi
On 10/10/2013 15:05, Martin MOKREJŠ wrote:
> Hi,
>   rendering some of my charts takes almost 50GB of RAM. I believe below is a 
> stracktrace
> of one such situation when it already took 15GB. Would somebody comments on 
> what is
> matplotlib doing at the very moment? Why the recursion?
> 
>   The charts had to have 262422 data points in a 2D scatter plot, each point 
> has assigned
> its own color. They are in batches so that there are 153 distinct colors but 
> nevertheless,
> I assigned to each data point a color value. There are 153 legend items also 
> (one color
> won't be used).

Hello Martin,

can I ask what is the meaning of plotting a scatter plot with 200
thousands points in it?  Either you visualize it on a screen much larger
than mine, or you are not going to be able to distinguish the single
data points. Maybe you should rethink the visualization tool you are using.

Nevertheless, I'm perfectly able to plot a scatter plot with 262422 data
points each with its own color just fine, and the python process
consumes a few hundred Mb of ram (having quite a few other datasets
loaded in memory)::

import numpy as np
import matplotlib.pyplot as plt
n = 262422
x = np.random.rand(n)
y = np.random.rand(n)
c = np.random.rand(n)
f = plt.figure()
a = f.add_subplot(111)
a.scatter(x, y, c=c, s=50)
plt.show()

and a possible solution using exactly 153 different colors, but again, I
don't see how you can distinguish between hundreds different shades of
colors::

 n = 262422 #22
ncolors = 153
x = np.random.rand(n)
y = np.random.rand(n)
c = np.random.rand(ncolors)
f = plt.figure()
a = f.add_subplot(111)
for i in xrange(n // ncolors):
a.scatter(x[i*ncolors:(i+1)*ncolors],
  y[i*ncolors:(i+1)*ncolors], c=c, s=50)
plt.show()

Unfortunately the code you provide is too contrived to be useful to
understand the root cause of your problem.

Cheers,
Daniele


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-14 Thread Martin MOKREJŠ


Michael Droettboom wrote:
> Sorry to repeat myself, but please reduce this to a short, self contained 
> example, that is absolutely minimal to demonstrate the problem.  
> http://sscce.org/ should help better explain what I'm after.  I don't want to 
> find the needle in the haystack here -- there is code in your example that 
> doesn't even run, for example.
> 
> That said, are you really after creating a legend entry for each of the dots? 
>  (See below).  That just isn't going to work, and I'm not surprised it eats 
> up excessive amounts of memory.  I think you want (and can) reduce this to a 
> single scatter call.
> 
> _series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, hatch='.') for 
> _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends)] # returns 
> PathCollection object

Are you sure? I think it was concluded on this list that scatter cannot (or 
does not) take
nested lists of lists with series like histogram and piechart do. I cannot find 
the thread
but maybe you are more lucky. I even think that I already opened a 
bugreport/feature requested
in the past for this. But maybe not.
Martin

> 
> Mike
> 
> On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote:
>> Hi,
>>   so here is some quick but working example. I added there are 2-3 functions 
>> (unused)
>> as a bonus, you can easily call them from the main function using same API
>> (except the piechart). I hope this shows what I lack in matplotlib - a 
>> general API
>> so that I could easily switch form scatter plot to piechart or barchart 
>> without altering
>> much the function arguments. Messing with return objects line2D, 
>> PathCollection, Rectangle
>> is awkward and I would like to stay away from matplotlib's internals. ;) 
>> Some can be sliced,
>> so not, you will see in the code.
>>
>>   This eatmem.py will take easily all your memory. Drawing 30 dots is 
>> not feasible
>> with 16GB of RAM. While the example is for sure inefficient in many places 
>> generating the data
>> in python does not eat RAM. That happens afterwards.
>>
>> I would really like to hear whether matplotlib could be adjusted instead. ;) 
>> I already mentioned
>> in this thread that it is awkward to pre-create colors before passing all 
>> data to a drawing
>> function. I think we could all save a lot if matplotlib could dynamically 
>> fetch colors
>> on the fly from user-created generator, same for legends descriptions. I 
>> think my example
>> code shows the inefficient approach here. Would I have more time I would 
>> randomize a bit
>> more the sublist of each series so that the numbers in legends would be more 
>> variable
>> but that is a cosmetic issue.
>>   Probably due to my ignorance you will see that figures with legends have 
>> different font
>> sizes, axes are rescaled and the figure. Of course I wanted to have the 
>> drawing same via both
>> approaches but failed badly. The files/figures with legends should be just 
>> accompanied by the
>> legend "table" underneath but the drawing itself should be same. Maybe an 
>> issue with DPI settings
>> but not only.
>>
>>   I placed some comments in the code, please don't take them in person. ;) 
>> Of course
>> I am glad for the existing work and am happy to contribute my crap. I am 
>> fine if you rewamp
>> this ugly code into matplotlib testsuite, provide similar function (the API 
>> mentioned above)
>> so that I could use your code directly. That would be great. I just tried to 
>> show multiple
>> issues at once, notably that is why I included those unused functions. You 
>> will for sure find
>> a way to use them.
>>
>>  Regarding the "unnecessary" del() calls etc., I think I have to use keep 
>> some, Ben, because
>> the function is not always left soon enough. I could drop some, you are 
>> right, but for some
>> I don't think so. Matplotlib cannot recycle the memory until me (upstream) 
>> deletes the reference
>> so ... go and test this lousy code. Now you have a testcase. ;) Same with 
>> the gc.collect() calls.
>> Actually, the main loop with 10 iteration is there just to show why I always 
>> want to clear
>> a figure when entering a function and while leaving it as well. It happened 
>> too many times that
>> I drawed over an old figure, and this was posted also few times on this list 
>> by others. That is
>> a weird behavior in my opinion. We, users, are just forced to use too 
>> low-level functions.
>>
>> So, have fun eating your memory! :))
>> Martin
> 
> 
> -- 
>_
> |\/|o _|_  _. _ | | \.__  __|__|_|_  _  _ ._ _  
> |  ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | 
> 
> http://www.droettboom.com
> 

-- 
Martin Mokrejs, Ph.D.
Bioinformatics
Donovalska 1658
149 00 Prague
Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application perform

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-14 Thread Michael Droettboom
Sorry to repeat myself, but please reduce this to a short, self 
contained example, that is absolutely minimal to demonstrate the 
problem. http://sscce.org/ should help better explain what I'm after.  I 
don't want to find the needle in the haystack here -- there is code in 
your example that doesn't even run, for example.


That said, are you really after creating a legend entry for each of the 
dots?  (See below).  That just isn't going to work, and I'm not 
surprised it eats up excessive amounts of memory.  I think you want (and 
can) reduce this to a single scatter call.


_series = [_ax1.scatter(_x, _y, color=_c, s=objsize, label=_l, 
hatch='.') for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, 
legends)] # returns PathCollection object


Mike

On 10/12/2013 12:57 PM, Martin MOKREJŠ wrote:

Hi,
   so here is some quick but working example. I added there are 2-3 functions 
(unused)
as a bonus, you can easily call them from the main function using same API
(except the piechart). I hope this shows what I lack in matplotlib - a general 
API
so that I could easily switch form scatter plot to piechart or barchart without 
altering
much the function arguments. Messing with return objects line2D, 
PathCollection, Rectangle
is awkward and I would like to stay away from matplotlib's internals. ;) Some 
can be sliced,
so not, you will see in the code.

   This eatmem.py will take easily all your memory. Drawing 30 dots is not 
feasible
with 16GB of RAM. While the example is for sure inefficient in many places 
generating the data
in python does not eat RAM. That happens afterwards.

I would really like to hear whether matplotlib could be adjusted instead. ;) I 
already mentioned
in this thread that it is awkward to pre-create colors before passing all data 
to a drawing
function. I think we could all save a lot if matplotlib could dynamically fetch 
colors
on the fly from user-created generator, same for legends descriptions. I think 
my example
code shows the inefficient approach here. Would I have more time I would 
randomize a bit
more the sublist of each series so that the numbers in legends would be more 
variable
but that is a cosmetic issue.
   Probably due to my ignorance you will see that figures with legends have 
different font
sizes, axes are rescaled and the figure. Of course I wanted to have the drawing 
same via both
approaches but failed badly. The files/figures with legends should be just 
accompanied by the
legend "table" underneath but the drawing itself should be same. Maybe an issue 
with DPI settings
but not only.

   I placed some comments in the code, please don't take them in person. ;) Of 
course
I am glad for the existing work and am happy to contribute my crap. I am fine 
if you rewamp
this ugly code into matplotlib testsuite, provide similar function (the API 
mentioned above)
so that I could use your code directly. That would be great. I just tried to 
show multiple
issues at once, notably that is why I included those unused functions. You will 
for sure find
a way to use them.

  Regarding the "unnecessary" del() calls etc., I think I have to use keep 
some, Ben, because
the function is not always left soon enough. I could drop some, you are right, 
but for some
I don't think so. Matplotlib cannot recycle the memory until me (upstream) 
deletes the reference
so ... go and test this lousy code. Now you have a testcase. ;) Same with the 
gc.collect() calls.
Actually, the main loop with 10 iteration is there just to show why I always 
want to clear
a figure when entering a function and while leaving it as well. It happened too 
many times that
I drawed over an old figure, and this was posted also few times on this list by 
others. That is
a weird behavior in my opinion. We, users, are just forced to use too low-level 
functions.

So, have fun eating your memory! :))
Martin



--
   _
|\/|o _|_  _. _ | | \.__  __|__|_|_  _  _ ._ _
|  ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-12 Thread Martin MOKREJŠ
Hi,
  so here is some quick but working example. I added there are 2-3 functions 
(unused)
as a bonus, you can easily call them from the main function using same API
(except the piechart). I hope this shows what I lack in matplotlib - a general 
API
so that I could easily switch form scatter plot to piechart or barchart without 
altering
much the function arguments. Messing with return objects line2D, 
PathCollection, Rectangle
is awkward and I would like to stay away from matplotlib's internals. ;) Some 
can be sliced,
so not, you will see in the code.

  This eatmem.py will take easily all your memory. Drawing 30 dots is not 
feasible
with 16GB of RAM. While the example is for sure inefficient in many places 
generating the data
in python does not eat RAM. That happens afterwards.

I would really like to hear whether matplotlib could be adjusted instead. ;) I 
already mentioned
in this thread that it is awkward to pre-create colors before passing all data 
to a drawing
function. I think we could all save a lot if matplotlib could dynamically fetch 
colors
on the fly from user-created generator, same for legends descriptions. I think 
my example
code shows the inefficient approach here. Would I have more time I would 
randomize a bit
more the sublist of each series so that the numbers in legends would be more 
variable
but that is a cosmetic issue.
  Probably due to my ignorance you will see that figures with legends have 
different font
sizes, axes are rescaled and the figure. Of course I wanted to have the drawing 
same via both
approaches but failed badly. The files/figures with legends should be just 
accompanied by the
legend "table" underneath but the drawing itself should be same. Maybe an issue 
with DPI settings
but not only.

  I placed some comments in the code, please don't take them in person. ;) Of 
course
I am glad for the existing work and am happy to contribute my crap. I am fine 
if you rewamp
this ugly code into matplotlib testsuite, provide similar function (the API 
mentioned above)
so that I could use your code directly. That would be great. I just tried to 
show multiple
issues at once, notably that is why I included those unused functions. You will 
for sure find
a way to use them.

 Regarding the "unnecessary" del() calls etc., I think I have to use keep some, 
Ben, because
the function is not always left soon enough. I could drop some, you are right, 
but for some
I don't think so. Matplotlib cannot recycle the memory until me (upstream) 
deletes the reference
so ... go and test this lousy code. Now you have a testcase. ;) Same with the 
gc.collect() calls.
Actually, the main loop with 10 iteration is there just to show why I always 
want to clear
a figure when entering a function and while leaving it as well. It happened too 
many times that
I drawed over an old figure, and this was posted also few times on this list by 
others. That is
a weird behavior in my opinion. We, users, are just forced to use too low-level 
functions.

So, have fun eating your memory! :))
Martin
#! /usr/bin/env python

import sys
import gc
from textwrap import wrap
from itertools import izip, imap, ifilter, chain
import numpy as np
from math import ceil

import colorsys
import matplotlib
matplotlib.use('Agg')
# Force matplotlib not to use any X-windows backend.
import pylab
matplotlib.use('Agg')

from random import uniform, randint, randrange

from optparse import OptionParser
myversion = 'xxx'
myparser = OptionParser(version="%s version %s" % ('%prog', myversion))

myparser.add_option("--series-num", action="store", type="int", dest="series_num", default=200,
help="Set number of series in the charts. Each series has its own color and legend text.")
myparser.add_option("--max-datapoints-per-series", action="store", type="int", dest="max_datapoints_per_series", default=2000,
help="Set number of data points to be generated at random. The actual counts will appear in the legend.")

(myoptions, myargs) = myparser.parse_args()


# convert the view of numpy array to tuple
# http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html


def generate_color_tuples(cnt, start, stop):
_HSV_tuples1 = []
for _n in xrange(1,cnt + 1):
_h1 = sorted([uniform(start, stop) for x in xrange(_n)])
_HSV_tuples1 = [(_h1[x], 1.0, 1.0) for x in xrange(_n)]

return [colorsys.hsv_to_rgb(*x) for x in _HSV_tuples1]


def generate_color_tuples_wrapper(_wanted_length):
"""Generating lots of colors is useless. Try to make a color list using batches of
colors. Make 100 different colors and then re-use them to get the final number.
"""

if _wanted_length > 300:
_manageable_length = ( _wanted_length / 100 ) + 1 # round up
_short_colors = generate_color_tuples(100, 0.01, 0.95) # 0.01, 0.95)
_colors = []

# this way we rotate the color batches several times

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ
Hi Ben,
  thank you for your comments. Looks I will have a bad sleep tonight. :( Some 
quick
answers below.

Benjamin Root wrote:
> 
> 
> 
> On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom  > wrote:
> 
> Thanks.  This is much more helpful.
> 
> What we need, however, is a "self contained, standalone example".  The 
> code below calls functions that are not present.  See http://sscce.org/ for 
> why this is so important.  Again, I would have to guess what those functions 
> do -- it may be relevant, it may not.  If I have something that I can *just 
> run* then I can use various introspection tools to see what is going wrong.
> 
> Mike
> 
> 
> That being said, I do see a number of anti-patterns here that could be 
> significant. For example:
> 
> for _x, _y, _c in izip(mydata_x, mydata_y, colors):
> # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
> _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # 
> , label=_l) # returns PathCollection object
> _series.append(_my_PathCollection)
> 
> Could be more concisely written as:
> 
> _series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c 
> in izip(mydata_x, mydata_y, colors)]
> 
> Python can then more intelligently handle memory management by intelligently 
> allocating the memory for _series. You can then use _series.extend() for when 
> you are doing the scatter plots for _ax2 with a similar list comprehension 
> (or even a generator statement).

You are right the .append() is ugly, maybe is a the real source of troubles. I 
somehow
do not understand myself right now why under the "if legends:" use ax1 instead 
of ax2.
Weird. I actually stopped using legends with this function because that was my 
first guess
that they cause the memory issues. Seems the culprit is elsewhere so I should 
add them
back and likely fix the ax2 vs. ax1 copy/paste (most likely) error.

As you could have seen, I used in the past label=_l but for some reason I 
switched away
to the current ugly code. Will try to find out why I did that.

Hmm, I don't know what you mean with _series.extend() at the moment, will read 
some
python Intro on using lists. :(


> 
> I would also question the need to store _series in the first place. You use 
> it for the call to legend, but you could have simply passed a label to each 
> call of scatter as well.

As I said, I used that in the past but somehow that did not work. Maybe time to 
re-try that.

> 
> Some other things of note:
> 
> 1) The clear() call here is completely useless as the figure is already clear.
> _figure = pylab.figure()
> _figure.clear()

Right, I was just trying to ensure everything is cleared. I somewhat suspect 
python
garbage collector does not recycle too often, and therefore added more and more 
del()
and gc.collect() calls.

> 
> 2) When limits are set on an axis, autoscaling for that axis is automatically 
> turned off anyway, so no need to turn if off yourself (also not sure why you 
> are calling out to an external function here):
> _ax1.set_autoscale_on(False)
> set_limits(_ax1, xmin, xmax, ymin, ymax)

The set_limits() is called because I got unstable coordinates in every figure.
Sometimes, matplotlib used wider offset from the axes line while sometimes not.
So, I basically force same layout for expected layouts.

> 
> 3) Finally, some discussion on the end of your function here:
> if legends:
> _figure.savefig(filename, dpi=100) #, bbox_inches='tight')
> del(_my_PathCollection)
> del(_ax2)
> else:
> _figure.savefig(filename, dpi=100)
> 
> del(_series)
> del(_ax1)
> _figure.clear()
> del(_figure)
> pylab.clf()
> pylab.close()
> first, as discussed, you can easily eliminate the need for _my_PathCollection 
> and possibly even _series. Second, when calling _figure.clear(), all of its 
> axes objects are deleted for you, so you don't need to delete them yourself. 
> Third, you delete the _figure object, but then call "pylab.clf()". I haven't 
> double-checked exactly what would happen, but I think you might run the risk 
> of accidentially clearing some other existing figure by doing that. Lastly, 
> you then call pylab.close(), which I point out the same caveat as before. 
> Really, all you needed was pylab.close() and you can eliminate the 5 
> preceding lines and the other two del()'s. All del() really does is remove 
> the variable out of scope. Once that object is out of everybody's scope, then 
> the gc can clean it up. Since the function was ending anyway, there is no 
> point in deleting the variable.

Right, but I suspect that garbage collector  does not recycle quickly enough 
unused objects
after the function is left. If I generate many figure sin a loop, one after 
another, it
appeared to me helpful to interleave the function calls with the gc.coll

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Benjamin Root
On Thu, Oct 10, 2013 at 10:21 AM, Michael Droettboom wrote:

>  Thanks.  This is much more helpful.
>
> What we need, however, is a "self contained, standalone example".  The
> code below calls functions that are not present.  See http://sscce.org/for 
> why this is so important.  Again, I would have to guess what those
> functions do -- it may be relevant, it may not.  If I have something that I
> can *just run* then I can use various introspection tools to see what is
> going wrong.
>
> Mike
>
>
That being said, I do see a number of anti-patterns here that could be
significant. For example:

for _x, _y, _c in izip(mydata_x, mydata_y, colors):
# _Line2D = _ax1.plot(_x, _y) # returns Line2D object
_my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize)
# , label=_l) # returns PathCollection object
_series.append(_my_PathCollection)

Could be more concisely written as:

_series = [_ax1.scatter(_x, _y, color=_c, s=objsize) for _x, _y, _c
in izip(mydata_x, mydata_y, colors)]

Python can then more intelligently handle memory management by
intelligently allocating the memory for _series. You can then use
_series.extend() for when you are doing the scatter plots for _ax2 with a
similar list comprehension (or even a generator statement).

I would also question the need to store _series in the first place. You use
it for the call to legend, but you could have simply passed a label to each
call of scatter as well.

Some other things of note:

1) The clear() call here is completely useless as the figure is already
clear.
_figure = pylab.figure()
_figure.clear()

2) When limits are set on an axis, autoscaling for that axis is
automatically turned off anyway, so no need to turn if off yourself (also
not sure why you are calling out to an external function here):
_ax1.set_autoscale_on(False)
set_limits(_ax1, xmin, xmax, ymin, ymax)

3) Finally, some discussion on the end of your function here:
if legends:
_figure.savefig(filename, dpi=100) #, bbox_inches='tight')
del(_my_PathCollection)
del(_ax2)
else:
_figure.savefig(filename, dpi=100)

del(_series)
del(_ax1)
_figure.clear()
del(_figure)
pylab.clf()
pylab.close()
first, as discussed, you can easily eliminate the need for
_my_PathCollection and possibly even _series. Second, when calling
_figure.clear(), all of its axes objects are deleted for you, so you don't
need to delete them yourself. Third, you delete the _figure object, but
then call "pylab.clf()". I haven't double-checked exactly what would
happen, but I think you might run the risk of accidentially clearing some
other existing figure by doing that. Lastly, you then call pylab.close(),
which I point out the same caveat as before. Really, all you needed was
pylab.close() and you can eliminate the 5 preceding lines and the other two
del()'s. All del() really does is remove the variable out of scope. Once
that object is out of everybody's scope, then the gc can clean it up. Since
the function was ending anyway, there is no point in deleting the variable.

I don't know if this would fix your problem, and there are a bunch of other
style issues here (particularly, pylab really shouldn't be used this way),
but hopefully this gives some food for thought.

Cheers!
Ben Root
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Michael Droettboom

Thanks.  This is much more helpful.

What we need, however, is a "self contained, standalone example". The 
code below calls functions that are not present.  See http://sscce.org/ 
for why this is so important.  Again, I would have to guess what those 
functions do -- it may be relevant, it may not.  If I have something 
that I can *just run* then I can use various introspection tools to see 
what is going wrong.


Mike

On 10/10/2013 10:12 AM, Martin MOKREJŠ wrote:

Michael Droettboom wrote:

Can you provide a complete, standalone example that reproduces the
problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with
them.

Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning 
message some weeks
ago. Yes, i do call _figure.clear() and pylab.clf()  but only after the 
savefig() returns, which
is not the case here. Also use gc.collect() a lot through the code, especially 
before and after
I draw every figure. That is not enough here.





from itertools import izip, imap, ifilter
import pylab
import matplotlib
# Force matplotlib not to use any X-windows backend.
matplotlib.use('Agg')
import pylab

F = pylab.gcf()

# convert the view of numpy array to tuple
# 
http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html
DefaultSize = tuple(F.get_size_inches())



def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, 
xlabel_data, ylabel_data, legends, legend_loc='upper right', 
legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, 
ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, 
tight_layout=False, legend_inside=False, objsize=0.1):
 # hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None 
**kwargs)

 if len(mydata_x) != len(mydata_y):
 raise ValueError, "%s: len(mydata_x) != len(mydata_y): %s != %s" % 
(filename, len(mydata_x), len(mydata_y))

 if colors and len(mydata_x) != len(colors):
 sys.stderr.write("Warning: draw_hist2d_plot(): %s: len(mydata_x) != 
len(colors): %s != %s.\n" % (filename, len(mydata_x), len(colors)))

 if colors and legends and len(colors) != len(legends):
 sys.stderr.write("Warning: draw_hist2d_plot(): %s, len(colors) != 
len(legends): %s != %s.\n" % (filename, len(colors), len(legends)))

 if mydata_x and mydata_y and filename:
 if legends:
 if not legend_ncol:
 _subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, 
fontsize=legend_fontsize)
 else:
 _subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, 
legend_ncol
 else:
 _subfigs, _ax1_num, _legend_ncol = 3, 313, 0

 set_my_pylab_defaults()
 pylab.clf()
 _figure = pylab.figure()
 _figure.clear()
 _figure.set_tight_layout(True)
 gc.collect()

 if legends:
 # do not crash on too tall figures
 if 8.4 * _subfigs < 200:
 _figure.set_size_inches(11.2, 8.4 * (_subfigs + 1))
 else:
 # _figure.set_size_inches() silently accepts a large value but 
later on _figure.savefig() crashes with:
 # ValueError: width and height must each be below 32768
 _figure.set_size_inches(11.2, 200)
 sys.stderr.write("Warning: draw_hist2d_plot(): Wanted to set %s 
figure height to %s but is too high, forcing %s instead. You will likely get an 
incomplete image.\n" % (filename, 8.4 * _subfigs, 200))
 if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s 
figure size to: %s" % (filename, str(_figure.get_size_inches()))
 _ax1 = _figure.add_subplot(_ax1_num)
 _ax2 = _figure.add_subplot(_ax2_num)
 else:
 _figure.set_size_inches(11.2, 8.4 * 2)
 _ax1 = _figure.gca()
 if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s figure 
size to: %s" % (filename, str(_figure.get_size_inches()))

 _series = []
 #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
 for _x, _y, _c in izip(mydata_x, mydata_y, colors):
 # _Line2D = _ax1.plot(_x, _y) # returns Line2D object
 _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , 
label=_l) # returns PathCollection object
 _series.append(_my_PathCollection)

 if legends:
 #for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
 for _x, _y, _c in izip(mydata_x, mydata_y, colors):
 _my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) 
# , label=_l)
 _series.append(_my_PathCollection)

 _ax2.legend(_series, legends, loc='upper left', 
bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ


Michael Droettboom wrote:
> On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote:
>> Benjamin Root wrote:
>>>
>>>
>>> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ >> > wrote:
>>>
>>>  Hi,
>>>rendering some of my charts takes almost 50GB of RAM. I believe 
>>> below is a stracktrace
>>>  of one such situation when it already took 15GB. Would somebody 
>>> comments on what is
>>>  matplotlib doing at the very moment? Why the recursion?
>>>
>>>The charts had to have 262422 data points in a 2D scatter plot, each 
>>> point has assigned
>>>  its own color. They are in batches so that there are 153 distinct 
>>> colors but nevertheless,
>>>  I assigned to each data point a color value. There are 153 legend 
>>> items also (one color
>>>  won't be used).
>>>
>>>  ^CTraceback (most recent call last):
>>>  ...
>>>  _figure.savefig(filename, dpi=100)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 
>>> 1421, in savefig
>>>  self.canvas.print_figure(*args, **kwargs)
>>>File 
>>> "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 
>>> 2220, in print_figure
>>>  **kwargs)
>>>File 
>>> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", 
>>> line 505, in print_png
>>>  FigureCanvasAgg.draw(self)
>>>File 
>>> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", 
>>> line 451, in draw
>>>  self.figure.draw(self.renderer)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>>> 54, in draw_wrapper
>>>  draw(artist, renderer, *args, **kwargs)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 
>>> 1034, in draw
>>>  func(*args)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>>> 54, in draw_wrapper
>>>  draw(artist, renderer, *args, **kwargs)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 
>>> 2086, in draw
>>>  a.draw(renderer)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>>> 54, in draw_wrapper
>>>  draw(artist, renderer, *args, **kwargs)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
>>> line 718, in draw
>>>  return Collection.draw(self, renderer)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>>> 54, in draw_wrapper
>>>  draw(artist, renderer, *args, **kwargs)
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
>>> line 276, in draw
>>>  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
>>>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
>>> line 551, in get_edgecolor
>>>  return self._edgecolors
>>>  KeyboardInterrupt
>>>  ^CError in atexit._run_exitfuncs:
>>>  Traceback (most recent call last):
>>>File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
>>>  func(*targs, **kargs)
>>>File 
>>> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, 
>>> in destroy_all
>>>  gc.collect()
>>>  KeyboardInterrupt
>>>  Error in sys.exitfunc:
>>>  Traceback (most recent call last):
>>>File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
>>>  func(*targs, **kargs)
>>>File 
>>> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, 
>>> in destroy_all
>>>  gc.collect()
>>>  KeyboardInterrupt
>>>
>>>  ^C
>>>
>>>
>>>  Clues what is the code doing? I use mpl-1.3.0.
>>>  Thank you,
>>>  Martin
>>>
>>>
>>> Unfortunately, that stacktrace isn't very useful. There is no recursion 
>>> there, but rather the perfectly normal drawing of the figure object that 
>>> has a child axes, which has child collections which have child artist 
>>> objects.
>>>
>>> Without the accompanying code, it would be difficult to determine where the 
>>> memory hog is.
>> Could there be places where gc.collect() could be introduced? Are there 
>> places where matplotlib
>> could del() unnecessary objects right away? I think the problem is with huge 
>> lists or pythonic
>> dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 
>> file having just
>> 10MB on disk. I speculate matplotlib in that code keeps the data in some 
>> huge list or more likely
>> a dict and that is the same issue.
>>
>> Are you sure you cannot see where a problem is? It happens (is visible) only 
>> with huge number of
>> dots, of course.
> 
> Matplotlib generally keeps data in Numpy arrays, not lists or 
> dictionaries (though given that matplotlib predates Numpy, there are 
> some corner cases we've found recently where arrays are converted to 
> lists and back unintentionally).

Just a brief note. I don'

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ


Benjamin Root wrote:
> 
> On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ  > wrote:
> 
> Benjamin Root wrote:
> >
> >
> >
> > On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ    >> wrote:
> >
> > Hi,
> >   rendering some of my charts takes almost 50GB of RAM. I believe 
> below is a stracktrace
> > of one such situation when it already took 15GB. Would somebody 
> comments on what is
> > matplotlib doing at the very moment? Why the recursion?
> >
> >   The charts had to have 262422 data points in a 2D scatter plot, 
> each point has assigned
> > its own color. They are in batches so that there are 153 distinct 
> colors but nevertheless,
> > I assigned to each data point a color value. There are 153 legend 
> items also (one color
> > won't be used).
> >
> > ^CTraceback (most recent call last):
> > ...
> > _figure.savefig(filename, dpi=100)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", 
> line 1421, in savefig
> > self.canvas.print_figure(*args, **kwargs)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, 
> in print_figure
> > **kwargs)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 
> 505, in print_png
> > FigureCanvasAgg.draw(self)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 
> 451, in draw
> > self.figure.draw(self.renderer)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", 
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", 
> line 1034, in draw
> > func(*args)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", 
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", 
> line 2086, in draw
> > a.draw(renderer)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", 
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in 
> draw
> > return Collection.draw(self, renderer)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", 
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in 
> draw
> > offsets, transOffset, self.get_facecolor(), 
> self.get_edgecolor(),
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in 
> get_edgecolor
> > return self._edgecolors
> > KeyboardInterrupt
> > ^CError in atexit._run_exitfuncs:
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> > func(*targs, **kargs)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, 
> in destroy_all
> > gc.collect()
> > KeyboardInterrupt
> > Error in sys.exitfunc:
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> > func(*targs, **kargs)
> >   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, 
> in destroy_all
> > gc.collect()
> > KeyboardInterrupt
> >
> > ^C
> >
> >
> > Clues what is the code doing? I use mpl-1.3.0.
> > Thank you,
> > Martin
> >
> >
> > Unfortunately, that stacktrace isn't very useful. There is no recursion 
> there, but rather the perfectly normal drawing of the figure object that has 
> a child axes, which has child collections which have child artist objects.
> >
> > Without the accompanying code, it would be difficult to determine where 
> the memory hog is.
> 
> Could there be places where gc.collect() could be introduced? Are there 
> places where matplotlib
> could del() unnecessary objects right away? I think the problem is with 
> huge lists or pythonic
> dicts. I could save 10GB of RAM when I converted one python dict to a 
> bsddb3 file having just
> 10MB on disk. I speculate matplotlib in that code keeps the data in some 
> huge list or more likely
> a dict and that is the same issue.
> 
> Are you sure you cannot see where a problem is? It happens (is visible) 
> only with huge numb

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Michael Droettboom
On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote:
> Benjamin Root wrote:
>>
>>
>> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ > > wrote:
>>
>>  Hi,
>>rendering some of my charts takes almost 50GB of RAM. I believe below 
>> is a stracktrace
>>  of one such situation when it already took 15GB. Would somebody 
>> comments on what is
>>  matplotlib doing at the very moment? Why the recursion?
>>
>>The charts had to have 262422 data points in a 2D scatter plot, each 
>> point has assigned
>>  its own color. They are in batches so that there are 153 distinct 
>> colors but nevertheless,
>>  I assigned to each data point a color value. There are 153 legend items 
>> also (one color
>>  won't be used).
>>
>>  ^CTraceback (most recent call last):
>>  ...
>>  _figure.savefig(filename, dpi=100)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 
>> 1421, in savefig
>>  self.canvas.print_figure(*args, **kwargs)
>>File 
>> "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, 
>> in print_figure
>>  **kwargs)
>>File 
>> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", 
>> line 505, in print_png
>>  FigureCanvasAgg.draw(self)
>>File 
>> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", 
>> line 451, in draw
>>  self.figure.draw(self.renderer)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>> 54, in draw_wrapper
>>  draw(artist, renderer, *args, **kwargs)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 
>> 1034, in draw
>>  func(*args)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>> 54, in draw_wrapper
>>  draw(artist, renderer, *args, **kwargs)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 
>> 2086, in draw
>>  a.draw(renderer)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>> 54, in draw_wrapper
>>  draw(artist, renderer, *args, **kwargs)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
>> line 718, in draw
>>  return Collection.draw(self, renderer)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
>> 54, in draw_wrapper
>>  draw(artist, renderer, *args, **kwargs)
>>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
>> line 276, in draw
>>  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
>>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
>> line 551, in get_edgecolor
>>  return self._edgecolors
>>  KeyboardInterrupt
>>  ^CError in atexit._run_exitfuncs:
>>  Traceback (most recent call last):
>>File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
>>  func(*targs, **kargs)
>>File 
>> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, 
>> in destroy_all
>>  gc.collect()
>>  KeyboardInterrupt
>>  Error in sys.exitfunc:
>>  Traceback (most recent call last):
>>File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
>>  func(*targs, **kargs)
>>File 
>> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, 
>> in destroy_all
>>  gc.collect()
>>  KeyboardInterrupt
>>
>>  ^C
>>
>>
>>  Clues what is the code doing? I use mpl-1.3.0.
>>  Thank you,
>>  Martin
>>
>>
>> Unfortunately, that stacktrace isn't very useful. There is no recursion 
>> there, but rather the perfectly normal drawing of the figure object that has 
>> a child axes, which has child collections which have child artist objects.
>>
>> Without the accompanying code, it would be difficult to determine where the 
>> memory hog is.
> Could there be places where gc.collect() could be introduced? Are there 
> places where matplotlib
> could del() unnecessary objects right away? I think the problem is with huge 
> lists or pythonic
> dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 
> file having just
> 10MB on disk. I speculate matplotlib in that code keeps the data in some huge 
> list or more likely
> a dict and that is the same issue.
>
> Are you sure you cannot see where a problem is? It happens (is visible) only 
> with huge number of
> dots, of course.

Matplotlib generally keeps data in Numpy arrays, not lists or 
dictionaries (though given that matplotlib predates Numpy, there are 
some corner cases we've found recently where arrays are converted to 
lists and back unintentionally).

As Ben said, the traceback looks quite normal -- and it doesn't show 
what any of the values are.  If you can provide us with a script that 
reproduces this, that's the only way we ca

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ
Michael Droettboom wrote:
> Can you provide a complete, standalone example that reproduces the 
> problem. Otherwise all I can do is guess.
> 
> The usual culprit is forgetting to close figures after you're done with 
> them.

Thanks, I learned that through matplotlib-1.3.0 give spit over me a warning 
message some weeks
ago. Yes, i do call _figure.clear() and pylab.clf()  but only after the 
savefig() returns, which
is not the case here. Also use gc.collect() a lot through the code, especially 
before and after
I draw every figure. That is not enough here.





from itertools import izip, imap, ifilter
import pylab
import matplotlib
# Force matplotlib not to use any X-windows backend.
matplotlib.use('Agg')
import pylab

F = pylab.gcf()

# convert the view of numpy array to tuple
# 
http://matplotlib.1069221.n5.nabble.com/RendererAgg-int-width-int-height-dpi-debug-False-ValueError-width-and-height-must-each-be-below-32768-td27756.html
DefaultSize = tuple(F.get_size_inches())



def draw_hist2d_plot(filename, mydata_x, mydata_y, colors, title_data, 
xlabel_data, ylabel_data, legends, legend_loc='upper right', 
legend_bbox_to_anchor=(1.0, 1.0), legend_ncol=None, xmin=None, xmax=None, 
ymin=None, ymax=None, fontsize=10, legend_fontsize=8, dpi=100, 
tight_layout=False, legend_inside=False, objsize=0.1):
# hist2d(x, y, bins = None, range=None, weights=None, cmin=None, cmax=None 
**kwargs)

if len(mydata_x) != len(mydata_y):
raise ValueError, "%s: len(mydata_x) != len(mydata_y): %s != %s" % 
(filename, len(mydata_x), len(mydata_y))

if colors and len(mydata_x) != len(colors):
sys.stderr.write("Warning: draw_hist2d_plot(): %s: len(mydata_x) != 
len(colors): %s != %s.\n" % (filename, len(mydata_x), len(colors)))

if colors and legends and len(colors) != len(legends):
sys.stderr.write("Warning: draw_hist2d_plot(): %s, len(colors) != 
len(legends): %s != %s.\n" % (filename, len(colors), len(legends)))

if mydata_x and mydata_y and filename:
if legends:
if not legend_ncol:
_subfigs, _ax1_num, _ax2_num, _legend_ncol = get_ncol(legends, 
fontsize=legend_fontsize)
else:
_subfigs, _ax1_num, _ax2_num, _legend_ncol = 3, 213, 313, 
legend_ncol
else:
_subfigs, _ax1_num, _legend_ncol = 3, 313, 0

set_my_pylab_defaults()
pylab.clf()
_figure = pylab.figure()
_figure.clear()
_figure.set_tight_layout(True)
gc.collect()

if legends:
# do not crash on too tall figures
if 8.4 * _subfigs < 200:
_figure.set_size_inches(11.2, 8.4 * (_subfigs + 1))
else:
# _figure.set_size_inches() silently accepts a large value but 
later on _figure.savefig() crashes with:
# ValueError: width and height must each be below 32768
_figure.set_size_inches(11.2, 200)
sys.stderr.write("Warning: draw_hist2d_plot(): Wanted to set %s 
figure height to %s but is too high, forcing %s instead. You will likely get an 
incomplete image.\n" % (filename, 8.4 * _subfigs, 200))
if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed 
%s figure size to: %s" % (filename, str(_figure.get_size_inches()))
_ax1 = _figure.add_subplot(_ax1_num)
_ax2 = _figure.add_subplot(_ax2_num)
else:
_figure.set_size_inches(11.2, 8.4 * 2)
_ax1 = _figure.gca()
if myoptions.debug > 5: print "Debug: draw_hist2d_plot(): Changed %s 
figure size to: %s" % (filename, str(_figure.get_size_inches()))

_series = []
#for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
for _x, _y, _c in izip(mydata_x, mydata_y, colors):
# _Line2D = _ax1.plot(_x, _y) # returns Line2D object
_my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) # , 
label=_l) # returns PathCollection object
_series.append(_my_PathCollection)

if legends:
#for _x, _y, _c, _l in izip(mydata_x, mydata_y, colors, legends):
for _x, _y, _c in izip(mydata_x, mydata_y, colors):
_my_PathCollection = _ax1.scatter(_x, _y, color=_c, s=objsize) 
# , label=_l)
_series.append(_my_PathCollection)

_ax2.legend(_series, legends, loc='upper left', 
bbox_to_anchor=(0,0,1,1), borderaxespad=0., ncol=_legend_ncol, mode='expand', 
fontsize=legend_fontsize)
_ax2.set_frame_on(False)
_ax2.tick_params(bottom='off', left='off', right='off', top='off')
pylab.setp(_ax2.get_yticklabels(), visible=False)
pylab.setp(_ax2.get_xticklabels(), visible=False)
else:
for _x, _y, _c in izip(mydata_x, mydata_y, colors):
_ax1.scatter(_x, _y, color=_c, s=objsize) #, marker='^') # 
keeps eating memory in:
#
# draw_hist2d_plot(filena

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Benjamin Root
On Thu, Oct 10, 2013 at 9:47 AM, Martin MOKREJŠ  wrote:

> Benjamin Root wrote:
> >
> >
> >
> > On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ  mmokr...@gmail.com>> wrote:
> >
> > Hi,
> >   rendering some of my charts takes almost 50GB of RAM. I believe
> below is a stracktrace
> > of one such situation when it already took 15GB. Would somebody
> comments on what is
> > matplotlib doing at the very moment? Why the recursion?
> >
> >   The charts had to have 262422 data points in a 2D scatter plot,
> each point has assigned
> > its own color. They are in batches so that there are 153 distinct
> colors but nevertheless,
> > I assigned to each data point a color value. There are 153 legend
> items also (one color
> > won't be used).
> >
> > ^CTraceback (most recent call last):
> > ...
> > _figure.savefig(filename, dpi=100)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py",
> line 1421, in savefig
> > self.canvas.print_figure(*args, **kwargs)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line
> 2220, in print_figure
> > **kwargs)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
> line 505, in print_png
> > FigureCanvasAgg.draw(self)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
> line 451, in draw
> > self.figure.draw(self.renderer)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py",
> line 1034, in draw
> > func(*args)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line
> 2086, in draw
> > a.draw(renderer)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718,
> in draw
> > return Collection.draw(self, renderer)
> >   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py",
> line 54, in draw_wrapper
> > draw(artist, renderer, *args, **kwargs)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276,
> in draw
> > offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551,
> in get_edgecolor
> > return self._edgecolors
> > KeyboardInterrupt
> > ^CError in atexit._run_exitfuncs:
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> > func(*targs, **kargs)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90,
> in destroy_all
> > gc.collect()
> > KeyboardInterrupt
> > Error in sys.exitfunc:
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> > func(*targs, **kargs)
> >   File
> "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90,
> in destroy_all
> > gc.collect()
> > KeyboardInterrupt
> >
> > ^C
> >
> >
> > Clues what is the code doing? I use mpl-1.3.0.
> > Thank you,
> > Martin
> >
> >
> > Unfortunately, that stacktrace isn't very useful. There is no recursion
> there, but rather the perfectly normal drawing of the figure object that
> has a child axes, which has child collections which have child artist
> objects.
> >
> > Without the accompanying code, it would be difficult to determine where
> the memory hog is.
>
> Could there be places where gc.collect() could be introduced? Are there
> places where matplotlib
> could del() unnecessary objects right away? I think the problem is with
> huge lists or pythonic
> dicts. I could save 10GB of RAM when I converted one python dict to a
> bsddb3 file having just
> 10MB on disk. I speculate matplotlib in that code keeps the data in some
> huge list or more likely
> a dict and that is the same issue.
>
> Are you sure you cannot see where a problem is? It happens (is visible)
> only with huge number of
> dots, of course.
>
>
I am not going to claim that matplotlib is the most lean graphing library
out there, and we already do know where we can make continued improvements,
but the symptom you are describing (50 GB for a couple hundred thousand
scatter points) is just unheard of for matplotlib. Without a simple,
concise, complete code example to demonstrate your problem, we can only
hazard guesses. For all I know, you might be "appending" to numpy arrays in
a loop pr

Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Martin MOKREJŠ
Benjamin Root wrote:
> 
> 
> 
> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ  > wrote:
> 
> Hi,
>   rendering some of my charts takes almost 50GB of RAM. I believe below 
> is a stracktrace
> of one such situation when it already took 15GB. Would somebody comments 
> on what is
> matplotlib doing at the very moment? Why the recursion?
> 
>   The charts had to have 262422 data points in a 2D scatter plot, each 
> point has assigned
> its own color. They are in batches so that there are 153 distinct colors 
> but nevertheless,
> I assigned to each data point a color value. There are 153 legend items 
> also (one color
> won't be used).
> 
> ^CTraceback (most recent call last):
> ...
> _figure.savefig(filename, dpi=100)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 
> 1421, in savefig
> self.canvas.print_figure(*args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", 
> line 2220, in print_figure
> **kwargs)
>   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 
> 505, in print_png
> FigureCanvasAgg.draw(self)
>   File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 
> 451, in draw
> self.figure.draw(self.renderer)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
> 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 
> 1034, in draw
> func(*args)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
> 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 
> 2086, in draw
> a.draw(renderer)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
> 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
> line 718, in draw
> return Collection.draw(self, renderer)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 
> 54, in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
> line 276, in draw
> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
>   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", 
> line 551, in get_edgecolor
> return self._edgecolors
> KeyboardInterrupt
> ^CError in atexit._run_exitfuncs:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> func(*targs, **kargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", 
> line 90, in destroy_all
> gc.collect()
> KeyboardInterrupt
> Error in sys.exitfunc:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> func(*targs, **kargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", 
> line 90, in destroy_all
> gc.collect()
> KeyboardInterrupt
> 
> ^C
> 
> 
> Clues what is the code doing? I use mpl-1.3.0.
> Thank you,
> Martin
> 
> 
> Unfortunately, that stacktrace isn't very useful. There is no recursion 
> there, but rather the perfectly normal drawing of the figure object that has 
> a child axes, which has child collections which have child artist objects.
> 
> Without the accompanying code, it would be difficult to determine where the 
> memory hog is.

Could there be places where gc.collect() could be introduced? Are there places 
where matplotlib
could del() unnecessary objects right away? I think the problem is with huge 
lists or pythonic
dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 
file having just
10MB on disk. I speculate matplotlib in that code keeps the data in some huge 
list or more likely
a dict and that is the same issue.

Are you sure you cannot see where a problem is? It happens (is visible) only 
with huge number of
dots, of course.

Thanks,
Martin

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Benjamin Root
On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ  wrote:

> Hi,
>   rendering some of my charts takes almost 50GB of RAM. I believe below is
> a stracktrace
> of one such situation when it already took 15GB. Would somebody comments
> on what is
> matplotlib doing at the very moment? Why the recursion?
>
>   The charts had to have 262422 data points in a 2D scatter plot, each
> point has assigned
> its own color. They are in batches so that there are 153 distinct colors
> but nevertheless,
> I assigned to each data point a color value. There are 153 legend items
> also (one color
> won't be used).
>
> ^CTraceback (most recent call last):
> ...
> _figure.savefig(filename, dpi=100)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line
> 1421, in savefig
> self.canvas.print_figure(*args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py",
> line 2220, in print_figure
> **kwargs)
>   File
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
> line 505, in print_png
> FigureCanvasAgg.draw(self)
>   File
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py",
> line 451, in draw
> self.figure.draw(self.renderer)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
> in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line
> 1034, in draw
> func(*args)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
> in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086,
> in draw
> a.draw(renderer)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
> in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py",
> line 718, in draw
> return Collection.draw(self, renderer)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54,
> in draw_wrapper
> draw(artist, renderer, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py",
> line 276, in draw
> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
>   File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py",
> line 551, in get_edgecolor
> return self._edgecolors
> KeyboardInterrupt
> ^CError in atexit._run_exitfuncs:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> func(*targs, **kargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py",
> line 90, in destroy_all
> gc.collect()
> KeyboardInterrupt
> Error in sys.exitfunc:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
> func(*targs, **kargs)
>   File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py",
> line 90, in destroy_all
> gc.collect()
> KeyboardInterrupt
>
> ^C
>
>
> Clues what is the code doing? I use mpl-1.3.0.
> Thank you,
> Martin
>
>
Unfortunately, that stacktrace isn't very useful. There is no recursion
there, but rather the perfectly normal drawing of the figure object that
has a child axes, which has child collections which have child artist
objects.

Without the accompanying code, it would be difficult to determine where the
memory hog is.

Ben Root
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Matplotlib eating memory

2013-10-10 Thread Michael Droettboom
Can you provide a complete, standalone example that reproduces the 
problem. Otherwise all I can do is guess.

The usual culprit is forgetting to close figures after you're done with 
them.

Mike

On 10/10/2013 09:05 AM, Martin MOKREJŠ wrote:
> Hi,
>rendering some of my charts takes almost 50GB of RAM. I believe below is a 
> stracktrace
> of one such situation when it already took 15GB. Would somebody comments on 
> what is
> matplotlib doing at the very moment? Why the recursion?
>
>The charts had to have 262422 data points in a 2D scatter plot, each point 
> has assigned
> its own color. They are in batches so that there are 153 distinct colors but 
> nevertheless,
> I assigned to each data point a color value. There are 153 legend items also 
> (one color
> won't be used).
>
> ^CTraceback (most recent call last):
> ...
>  _figure.savefig(filename, dpi=100)
>File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, 
> in savefig
>  self.canvas.print_figure(*args, **kwargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", 
> line 2220, in print_figure
>  **kwargs)
>File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 
> 505, in print_png
>  FigureCanvasAgg.draw(self)
>File 
> "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 
> 451, in draw
>  self.figure.draw(self.renderer)
>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, 
> in draw_wrapper
>  draw(artist, renderer, *args, **kwargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, 
> in draw
>  func(*args)
>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, 
> in draw_wrapper
>  draw(artist, renderer, *args, **kwargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, 
> in draw
>  a.draw(renderer)
>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, 
> in draw_wrapper
>  draw(artist, renderer, *args, **kwargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 
> 718, in draw
>  return Collection.draw(self, renderer)
>File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, 
> in draw_wrapper
>  draw(artist, renderer, *args, **kwargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 
> 276, in draw
>  offsets, transOffset, self.get_facecolor(), self.get_edgecolor(),
>File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 
> 551, in get_edgecolor
>  return self._edgecolors
> KeyboardInterrupt
> ^CError in atexit._run_exitfuncs:
> Traceback (most recent call last):
>File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
>  func(*targs, **kargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", 
> line 90, in destroy_all
>  gc.collect()
> KeyboardInterrupt
> Error in sys.exitfunc:
> Traceback (most recent call last):
>File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
>  func(*targs, **kargs)
>File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", 
> line 90, in destroy_all
>  gc.collect()
> KeyboardInterrupt
>
> ^C
>
>
> Clues what is the code doing? I use mpl-1.3.0.
> Thank you,
> Martin
>
> --
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> ___
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users


-- 
_
|\/|o _|_  _. _ | | \.__  __|__|_|_  _  _ ._ _
|  ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | |

http://www.droettboom.com


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users